SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Y Li, G Zhang, Y Ma, R Yuan, K Zhu, H Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in multimodal large language models (MLLMs) have aimed to
integrate and interpret data across diverse modalities. However, the capacity of these …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

J Roberts, K Han, N Houlsby, S Albanie - arXiv preprint arXiv:2405.08807, 2024 - arxiv.org

Large multimodal models (LMMs) have proven flexible and generalisable across many tasks
and fields. Although they have strong potential to aid scientific research, their capabilities in …

被引用次数：9 相关文章所有 2 个版本

[PDF] aclanthology.org

MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts

X Cheng, Z Zhu, X Zhuang, Z Chen… - Findings of the …, 2024 - aclanthology.org

As a crucial task in the task-oriented dialogue systems, spoken language understanding
(SLU) has garnered increasing attention. However, errors from automatic speech …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models

S Wu, K Zhu, Y Bai, Y Liang, Y Li, H Wu, JH Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Given the remarkable success that large visual language models (LVLMs) have achieved in
image perception tasks, the endeavor to make LVLMs perceive the world like humans is …

[PDF] arxiv.org