Minicpm-v: A gpt-4v level mllm on your phone

Y Yao, T Yu, A Zhang, C Wang, J Cui, H Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …

DocTabQA: Answering Questions from Long Documents Using Tables

H Wang, K Hu, H Dong, L Gao - International Conference on Document …, 2024 - Springer
We study a new problem setting of question answering (QA), referred to as DocTabQA.
Within this setting, given a long document, the goal is to respond to questions by organizing …

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

M Huang, L Han, X Zhang, W Wu, J Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
Chart understanding enables automated data analysis for humans, which requires models
to achieve highly accurate visual comprehension. While existing Visual Language Models …

AskChart: Universal Chart Understanding through Textual Enhancement

X Yang, Y Wu, Y Zhu, N Tang, Y Luo - arXiv preprint arXiv:2412.19146, 2024 - arxiv.org
Chart understanding tasks such as ChartQA and Chart-to-Text involve automatically
extracting and interpreting key information from charts, enabling users to query or convert …

Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature

L Shen, K Ding, G Meng, S Xiang - arXiv preprint arXiv:2412.12150, 2024 - arxiv.org
Scientific Literature charts often contain complex visual elements, including multi-plot
figures, flowcharts, structural diagrams and etc. Evaluating multimodal models using these …

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

S Mukhopadhyay, A Qidwai, A Garimella… - Findings of the …, 2024 - aclanthology.org
Chart question answering (CQA) is a crucial area of Visual Language Understanding.
However, the robustness and consistency of current Visual Language Models (VLMs) in this …

RealCQA-V2: Visual Premise Proving

S Ahmed, R Setlur, V Govindaraju - arXiv preprint arXiv:2410.22492, 2024 - arxiv.org
We introduce Visual Premise Proving (VPP), a novel task tailored to refine the process of
chart question answering by deconstructing it into a series of logical premises. Each of these …

[图书][B] Document Analysis and Recognition-ICDAR 2024: 18th International Conference, Athens, Greece, August 30-September 4, 2024, Proceedings, Part I

EHB Smith - 2024 - books.google.com
This six-volume set LNCS 14804-14809 constitutes the proceedings of the 18th International
Conference on Document Analysis and Recognition, ICDAR 2024, held in Athens, Greece …