Omnibench: Towards the future of universal omni-language models

Y Li, G Zhang, Y Ma, R Yuan, K Zhu, H Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in multimodal large language models (MLLMs) have aimed to
integrate and interpret data across diverse modalities. However, the capacity of these …

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

J Roberts, K Han, N Houlsby, S Albanie - arXiv preprint arXiv:2405.08807, 2024 - arxiv.org
Large multimodal models (LMMs) have proven flexible and generalisable across many tasks
and fields. Although they have strong potential to aid scientific research, their capabilities in …

MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts

X Cheng, Z Zhu, X Zhuang, Z Chen… - Findings of the …, 2024 - aclanthology.org
As a crucial task in the task-oriented dialogue systems, spoken language understanding
(SLU) has garnered increasing attention. However, errors from automatic speech …

MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models

S Wu, K Zhu, Y Bai, Y Liang, Y Li, H Wu, JH Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Given the remarkable success that large visual language models (LVLMs) have achieved in
image perception tasks, the endeavor to make LVLMs perceive the world like humans is …

LIME: Less Is More for MLLM Evaluation

K Zhu, Q Zang, S Jia, S Wu, F Fang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) are evaluated on various benchmarks, such
as image captioning, visual question answering, and reasoning. However, many of these …

Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

A Tsanda, E Bruches - arXiv preprint arXiv:2405.07886, 2024 - arxiv.org
The paper discusses the creation of a multimodal dataset of Russian-language scientific
papers and testing of existing language models for the task of automatic text summarization …

Harnessing CLIP for Evidence Identification in Scientific Literature: A Multimodal Approach to Context24 Shared Task

A Kumar, LL Wang - Proceedings of the Fourth Workshop on …, 2024 - aclanthology.org
Knowing whether scientific claims are supported by evidence is fundamental to scholarly
communication and evidence-based decision-making. We present our approach to Task 1 of …