Think twice before assure: Confidence estimation for large language models through reflection on multiple answers
Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …
large language models (LLM), especially the black-box ones. Existing confidence estimation …
Agent-pro: Learning to evolve via policy-level reflection and optimization
Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …
However, most LLM-based agents are designed as specific task solvers with sophisticated …
Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Large Language Models (LLMs) are often described as being instances of foundation
models-that is, models that transfer strongly across various tasks and conditions in few-show …
models-that is, models that transfer strongly across various tasks and conditions in few-show …
Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation
Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on
various reasoning tasks but struggles with free-form generation due to the difficulty of …
various reasoning tasks but struggles with free-form generation due to the difficulty of …
How Can LLM Guide RL? A Value-Based Approach
Reinforcement learning (RL) has become the de facto standard practice for sequential
decision-making problems by improving future acting policies with feedback. However, RL …
decision-making problems by improving future acting policies with feedback. However, RL …
DAC: Decomposed Automation Correction for Text-to-SQL
Text-to-SQL is an important task that helps people obtain information from databases by
automatically generating SQL queries. Considering the brilliant performance, approaches …
automatically generating SQL queries. Considering the brilliant performance, approaches …
Reasoning and Planning with Large Language Models in Code Development
Large Language Models (LLMs) are revolutionizing the field of code development by
leveraging their deep understanding of code patterns, syntax, and semantics to assist …
leveraging their deep understanding of code patterns, syntax, and semantics to assist …
Improving LLM Generations via Fine-Grained Self-Endorsement
This work studies mitigating fact-conflicting hallucinations for large language model (LLM) at
inference time. Particularly, we propose a self-endorsement framework that leverages the …
inference time. Particularly, we propose a self-endorsement framework that leverages the …
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Requiring a Large Language Model to generate intermediary reasoning steps has been
shown to be an effective way of boosting performance. In fact, it has been found that …
shown to be an effective way of boosting performance. In fact, it has been found that …