Learning from mistakes makes llm better reasoner

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …

被引用次数：89 相关文章所有 2 个版本

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

被引用次数：1574 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models for mathematical reasoning: Progresses and challenges

J Ahn, R Verma, R Lou, D Liu, R Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive
capabilities of human intelligence. In recent times, there has been a notable surge in the …

被引用次数：107 相关文章所有 4 个版本

[PDF] arxiv.org

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

被引用次数：99 相关文章所有 3 个版本

[PDF] arxiv.org

Understanding the planning of LLM agents: A survey

X Huang, W Liu, X Chen, X Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

As Large Language Models (LLMs) have shown significant intelligence, the progress to
leverage LLMs as planning modules of autonomous agents has attracted more attention …

被引用次数：64 相关文章所有 3 个版本

[PDF] arxiv.org

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Smaug: Fixing failure modes of preference optimisation with dpo-positive

A Pal, D Karkhanis, S Dooley, M Roberts… - arXiv preprint arXiv …, 2024 - arxiv.org

Direct Preference Optimisation (DPO) is effective at significantly improving the performance
of large language models (LLMs) on downstream tasks such as reasoning, summarisation …

被引用次数：77 相关文章所有 2 个版本

[PDF] arxiv.org

Common 7b language models already possess strong math capabilities

C Li, W Wang, J Hu, Y Wei, N Zheng, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Mathematical capabilities were previously believed to emerge in common language models
only at a very large scale or require extensive math-related pre-training. This paper shows …

被引用次数：48 相关文章所有 2 个版本

[PDF] arxiv.org

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

V-star: Training verifiers for self-taught reasoners

A Hosseini, X Yuan, N Malkin, A Courville… - arXiv preprint arXiv …, 2024 - arxiv.org

Common self-improvement approaches for large language models (LLMs), such as STaR
(Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve …

被引用次数：48 相关文章所有 2 个版本