When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

R Kamoi, Y Zhang, N Zhang, J Han… - Transactions of the …, 2024 - direct.mit.edu
Self-correction is an approach to improving responses from large language models (LLMs)
by refining the responses using LLMs during inference. Prior work has proposed various self …

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

Codemind: A framework to challenge large language models for code reasoning

C Liu, SD Zhang, AR Ibrahimzada… - arXiv preprint arXiv …, 2024 - arxiv.org
Solely relying on test passing to evaluate Large Language Models (LLMs) for code
synthesis may result in unfair assessment or promoting models with data leakage. As an …

An empirical evaluation of the gpt-4 multimodal language model on visualization literacy tasks

A Bendeck, J Stasko - IEEE Transactions on Visualization and …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) like GPT-4 which support multimodal input (ie, prompts
containing images in addition to text) have immense potential to advance visualization …

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning

R Xia, B Zhang, H Ye, X Yan, Q Liu, H Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged
continuously. However, their capacity to query information depicted in visual charts and …

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

YK Chia, L Cheng, HP Chan, C Liu, M Song… - arXiv preprint arXiv …, 2024 - arxiv.org
The ability to understand and answer questions over documents can be useful in many
business and practical applications. However, documents often contain lengthy and diverse …

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

K Kim, S Lee, KH Huang, HP Chan, M Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Fact-checking research has extensively explored verification but less so the generation of
natural-language explanations, crucial for user trust. While Large Language Models (LLMs) …

DracoGPT: Extracting Visualization Design Preferences from Large Language Models

HW Wang, M Gordon, L Battle… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Trained on vast corpora, Large Language Models (LLMs) have the potential to encode
visualization design knowledge and best practices. However, if they fail to do so, they might …

Self-correction is more than refinement: A learning framework for visual and language reasoning tasks

J He, H Lin, Q Wang, Y Fung, H Ji - arXiv preprint arXiv:2410.04055, 2024 - arxiv.org
While Vision-Language Models (VLMs) have shown remarkable abilities in visual and
language reasoning tasks, they invariably generate flawed responses. Self-correction that …

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

H Qiu, W Hu, ZY Dou, N Peng - arXiv preprint arXiv:2404.13874, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the
models generate plausible-sounding but factually incorrect outputs, undermining their …