Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

M Nezhurina, L Cipolina-Kun, M Cherti… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are often described as being instances of foundation
models-that is, models that transfer strongly across various tasks and conditions in few-show …

Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation

X Wang, Y Li, S Feng, P Yuan, B Pan, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on
various reasoning tasks but struggles with free-form generation due to the difficulty of …

How Can LLM Guide RL? A Value-Based Approach

S Zhang, S Zheng, S Ke, Z Liu, W Jin, J Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning (RL) has become the de facto standard practice for sequential
decision-making problems by improving future acting policies with feedback. However, RL …

DAC: Decomposed Automation Correction for Text-to-SQL

D Wang, L Dou, X Zhang, Q Zhu, W Che - arXiv preprint arXiv:2408.08779, 2024 - arxiv.org
Text-to-SQL is an important task that helps people obtain information from databases by
automatically generating SQL queries. Considering the brilliant performance, approaches …

Reasoning and Planning with Large Language Models in Code Development

H Ding, Z Fan, I Guehring, G Gupta, W Ha… - Proceedings of the 30th …, 2024 - dl.acm.org
Large Language Models (LLMs) are revolutionizing the field of code development by
leveraging their deep understanding of code patterns, syntax, and semantics to assist …

Improving LLM Generations via Fine-Grained Self-Endorsement

A Wang, L Song, B Peng, L Jin, Y Tian… - Findings of the …, 2024 - aclanthology.org
This work studies mitigating fact-conflicting hallucinations for large language model (LLM) at
inference time. Particularly, we propose a self-endorsement framework that leverages the …

Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

H Puerto, T Chubakov, X Zhu, HT Madabushi… - arXiv preprint arXiv …, 2024 - arxiv.org
Requiring a Large Language Model to generate intermediary reasoning steps has been
shown to be an effective way of boosting performance. In fact, it has been found that …