Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …
DocTabQA: Answering Questions from Long Documents Using Tables
We study a new problem setting of question answering (QA), referred to as DocTabQA.
Within this setting, given a long document, the goal is to respond to questions by organizing …
Within this setting, given a long document, the goal is to respond to questions by organizing …
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Chart understanding enables automated data analysis for humans, which requires models
to achieve highly accurate visual comprehension. While existing Visual Language Models …
to achieve highly accurate visual comprehension. While existing Visual Language Models …
AskChart: Universal Chart Understanding through Textual Enhancement
Chart understanding tasks such as ChartQA and Chart-to-Text involve automatically
extracting and interpreting key information from charts, enabling users to query or convert …
extracting and interpreting key information from charts, enabling users to query or convert …
Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
Scientific Literature charts often contain complex visual elements, including multi-plot
figures, flowcharts, structural diagrams and etc. Evaluating multimodal models using these …
figures, flowcharts, structural diagrams and etc. Evaluating multimodal models using these …
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
Chart question answering (CQA) is a crucial area of Visual Language Understanding.
However, the robustness and consistency of current Visual Language Models (VLMs) in this …
However, the robustness and consistency of current Visual Language Models (VLMs) in this …
RealCQA-V2: Visual Premise Proving
S Ahmed, R Setlur, V Govindaraju - arXiv preprint arXiv:2410.22492, 2024 - arxiv.org
We introduce Visual Premise Proving (VPP), a novel task tailored to refine the process of
chart question answering by deconstructing it into a series of logical premises. Each of these …
chart question answering by deconstructing it into a series of logical premises. Each of these …
[图书][B] Document Analysis and Recognition-ICDAR 2024: 18th International Conference, Athens, Greece, August 30-September 4, 2024, Proceedings, Part I
EHB Smith - 2024 - books.google.com
This six-volume set LNCS 14804-14809 constitutes the proceedings of the 18th International
Conference on Document Analysis and Recognition, ICDAR 2024, held in Athens, Greece …
Conference on Document Analysis and Recognition, ICDAR 2024, held in Athens, Greece …