Mimic-it: Multi-modal in-context instruction tuning B Li, Y Zhang, L Chen, J Wang, F Pu, J Yang, C Li, Z Liu arXiv preprint arXiv:2306.05425, 2023 | 400 | 2023 |
Mmbench: Is your multi-modal model an all-around player? Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao, Y Yuan, J Wang, C He, ... arXiv preprint arXiv:2307.06281, 2023 | 243 | 2023 |
Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations Y Zhang, ZF Yin, Y Li, G Yin, J Yan, J Shao, Z Liu Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 174 | 2020 |
Neural Prompt Search Y Zhang, K Zhou, Z Liu arXiv preprint arXiv:2206.04673, 2022 | 105 | 2022 |
Llava-next: Improved reasoning, ocr, and world knowledge H Liu, C Li, Y Li, B Li, Y Zhang, S Shen, YJ Lee | 54 | 2024 |
What makes good examples for visual in-context learning? Y Zhang, K Zhou, Z Liu Advances in Neural Information Processing Systems 36, 2024 | 52 | 2024 |
Vbench: Comprehensive benchmark suite for video generative models Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 28 | 2024 |
Octopus: Embodied vision-language programmer from environmental feedback J Yang, Y Dong, S Liu, B Li, Z Wang, C Jiang, H Tan, J Kang, Y Zhang, ... arXiv preprint arXiv:2310.08588, 2023 | 26 | 2023 |
Benchmarking omni-vision representation through the lens of visual realms Y Zhang, Z Yin, J Shao, Z Liu European Conference on Computer Vision, 594-611, 2022 | 19 | 2022 |
Learning without forgetting for vision-language models DW Zhou, Y Zhang, J Ning, HJ Ye, DC Zhan, Z Liu arXiv preprint arXiv:2305.19270, 2023 | 17 | 2023 |
Bamboo: Building mega-scale vision dataset continually with human-machine synergy Y Zhang, Q Sun, Y Zhou, Z He, Z Yin, K Wang, L Sheng, Y Qiao, J Shao, ... arXiv preprint arXiv:2203.07845, 2022 | 14 | 2022 |
Celeba-spoof challenge 2020 on face anti-spoofing: Methods and results Y Zhang, Z Yin, J Shao, Z Liu, S Yang, Y Xiong, W Xia, Y Xu, M Luo, J Liu, ... arXiv preprint arXiv:2102.12642, 2021 | 14 | 2021 |
Funqa: Towards surprising video comprehension B Xie, S Zhang, Z Zhou, B Li, Y Zhang, J Hessel, J Yang, Z Liu arXiv preprint arXiv:2306.14899, 2023 | 7 | 2023 |
3d point cloud pre-training with knowledge distillation from 2d images Y Yao, Y Zhang, Z Yin, J Luo, W Ouyang, X Huang arXiv preprint arXiv:2212.08974, 2022 | 7 | 2022 |
On-device domain generalization K Zhou, Y Zhang, Y Zang, J Yang, CC Loy, Z Liu arXiv preprint arXiv:2209.07521, 2022 | 5 | 2022 |
Multimodal foundation models for zero-shot animal species recognition in camera trap images Z Fabian, Z Miao, C Li, Y Zhang, Z Liu, A Hernández, A Montes-Rojas, ... arXiv preprint arXiv:2311.01064, 2023 | 4 | 2023 |
Robust face anti-spoofing with dual probabilistic modeling Y Zhang, Y Wu, Z Yin, J Shao, Z Liu arXiv preprint arXiv:2204.12685, 2022 | 3 | 2022 |
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning Y Zhang, K Zhang, B Li, F Pu, CA Setiadharma, J Yang, Z Liu arXiv preprint arXiv:2405.03272, 2024 | | 2024 |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward R Zhang, L Gui, Z Sun, Y Feng, K Xu, Y Zhang, D Fu, C Li, A Hauptmann, ... arXiv preprint arXiv:2404.01258, 2024 | | 2024 |
Knowledge Augmented Instruction Tuning for Zero-shot Animal Species Recognition Z Fabian, Z Miao, C Li, Y Zhang, Z Liu, A Hernandez, P Arbelaez, A Link, ... NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 2023 | | 2023 |