Minicpm-v: A gpt-4v level mllm on your phone Y Yao, T Yu, A Zhang, C Wang, J Cui, H Zhu, T Cai, H Li, W Zhao, Z He, ... arXiv preprint arXiv:2408.01800, 2024 | 55 | 2024 |
Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images R Xu, Y Yao, Z Guo, J Cui, Z Ni, C Ge, TS Chua, Z Liu, M Sun, G Huang arXiv preprint arXiv:2403.11703, 2024 | 51 | 2024 |
GUICourse: From General Vision Language Models to Versatile GUI Agents W Chen, J Cui, J Hu, Y Qin, J Fang, Y Zhao, C Wang, J Liu, G Chen, ... arXiv preprint arXiv:2406.11317, 2024 | 8 | 2024 |
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents S Yu, C Tang, B Xu, J Cui, J Ran, Y Yan, Z Liu, S Wang, X Han, Z Liu, ... arXiv preprint arXiv:2410.10594, 2024 | 1 | 2024 |