Large language models for data annotation: A survey

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Large language models for mathematical reasoning: Progresses and challenges

J Ahn, R Verma, R Lou, D Liu, R Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive
capabilities of human intelligence. In recent times, there has been a notable surge in the …

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

Understanding the planning of LLM agents: A survey

X Huang, W Liu, X Chen, X Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
As Large Language Models (LLMs) have shown significant intelligence, the progress to
leverage LLMs as planning modules of autonomous agents has attracted more attention …

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

Smaug: Fixing failure modes of preference optimisation with dpo-positive

A Pal, D Karkhanis, S Dooley, M Roberts… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct Preference Optimisation (DPO) is effective at significantly improving the performance
of large language models (LLMs) on downstream tasks such as reasoning, summarisation …

Common 7b language models already possess strong math capabilities

C Li, W Wang, J Hu, Y Wei, N Zheng, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Mathematical capabilities were previously believed to emerge in common language models
only at a very large scale or require extensive math-related pre-training. This paper shows …

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

V-star: Training verifiers for self-taught reasoners

A Hosseini, X Yuan, N Malkin, A Courville… - arXiv preprint arXiv …, 2024 - arxiv.org
Common self-improvement approaches for large language models (LLMs), such as STaR
(Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve …