Mimic-it: Multi-modal in-context instruction tuning B Li, Y Zhang, L Chen, J Wang, F Pu, J Yang, C Li, Z Liu arXiv preprint arXiv:2306.05425, 2023 | 473 | 2023 |
Otterhd: A high-resolution multi-modality model B Li, P Zhang, J Yang, Y Zhang, F Pu, Z Liu arXiv preprint arXiv:2311.04219, 2023 | 25 | 2023 |
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models K Zhang, B Li, P Zhang, F Pu, JA Cahyono, K Hu, S Liu, Y Zhang, J Yang, ... arXiv preprint arXiv:2407.12772, 2024 | 7* | 2024 |
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning Y Zhang, K Zhang, B Li, F Pu, CA Setiadharma, J Yang, Z Liu arXiv preprint arXiv:2405.03272, 2024 | | 2024 |