GroundingGPT: Language Enhanced Multi-modal Grounding Model Z Li, Q Xu, D Zhang, H Song, Y Cai, Q Qi, R Zhou, J Pan, Z Li, VT Vu, ... ACL 2024, 2024 | 32* | 2024 |
SpeechAlign: Aligning Speech Generation to Human Preferences D Zhang*, Z Li*, S Li, X Zhang, P Wang, Y Zhou, X Qiu NeurIPS 2024, 2024 | 11 | 2024 |
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model Z Li, W Wang, YQ Cai, X Qi, P Wang, D Zhang, H Song, B Jiang, Z Huang, ... arXiv preprint arXiv:2408.02503, 2024 | 6 | 2024 |
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems D Zhang, Z Li, P Wang, X Zhang, Y Zhou, X Qiu arXiv preprint arXiv:2401.03945, 2024 | 4 | 2024 |
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models W Wang*, Z Li*, Q Xu, L Li, Y Cai, B Jiang, H Song, X Hu, P Wang, L Xiao arXiv preprint arXiv:2411.09691, 2024 | 1 | 2024 |
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models W Wang, Z Li, Q Xu, Y Cai, H Song, Q Qi, R Zhou, Z Huang, T Wang, ... arXiv preprint arXiv:2405.13014, 2024 | 1 | 2024 |
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks B Jiang, L Li, X Li, Z Li, X Feng, L Kong, Q Liu, X Qiu arXiv preprint arXiv:2410.12329, 2024 | | 2024 |