mPLUG-Owl: Modularization empowers large language models with multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023 | 497 | 2023 |
Learning alignment for multimodal emotion recognition from speech H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li InterSpeech2019, 2019 | 158 | 2019 |
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections C Li, H Xu, J Tian, W Wang, M Yan, ... EMNLP2022, 2022 | 110* | 2022 |
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang ACL2021 Oral, 2021 | 105 | 2021 |
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou CVPR2024 Highlight, 2023 | 102 | 2023 |
mPLUG-2: A modularized multi-modal foundation model across text, image and video H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li ICML2023, 2023 | 85 | 2023 |
Neural Topic Modeling with Bidirectional Adversarial Training R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu ACL2020, 2020 | 83 | 2020 |
mPLUG-DocOwl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023 | 55 | 2023 |
Evaluation and analysis of hallucination in large vision-language models J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ... arXiv preprint arXiv:2308.15126, 2023 | 53 | 2023 |
Hitea: Hierarchical temporal-aware video-language pre-training Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang ICCV2023, 2022 | 49 | 2022 |
Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, et al. mplug: Effective and efficient vision-language learning by cross-modal skip-connections C Li, H Xu, J Tian, W Wang, M Yan arXiv preprint arXiv:2205.12005 1 (2), 2022 | 40 | 2022 |
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... EMNLP2023, 2023 | 39 | 2023 |
An llm-free multi-dimensional benchmark for mllms hallucination evaluation J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ... arXiv preprint arXiv:2311.07397, 2023 | 29 | 2023 |
An unsupervised Bayesian modelling approach for storyline detection on news articles D Zhou, H Xu, Y He EMNLP2015, 1943-1948, 2015 | 29 | 2015 |
Semvlp: Vision-language pre-training by aligning semantics at multiple levels C Li, M Yan, H Xu, F Luo, W Wang arXiv preprint arXiv:2103.07829 3, 2021 | 25 | 2021 |
Unsupervised Storyline Extraction from News Articles. D Zhou, H Xu, XY Dai, Y He IJCAI2016, 3014-3021, 2016 | 25 | 2016 |
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Y Shi, X Yang, H Xu, C Yuan, B Li, W Hu, ZJ Zha CVPR2022, 2021 | 24 | 2021 |
Hallucination augmented contrastive learning for multimodal large language model C Jiang, H Xu, M Dong, J Chen, W Ye, M Yan, Q Ye, J Zhang, F Huang, ... CVPR2024, 2023 | 19 | 2023 |
Mobile-Agent: Autonomous multi-modal mobile device agent with visual perception J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang ICLR2024 Workshop on Large Language Model (LLM) Agents, 2024 | 16 | 2024 |
Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, et al. 2022. mplug: Effective and efficient vision-language learning by cross-modal skip-connections C Li, H Xu, J Tian, W Wang, M Yan arXiv preprint arXiv:2205.12005, 2022 | 16 | 2022 |