Retrieval-augmented generation for natural language processing: A survey

S Wu, Y Xiong, Y Cui, H Wu, C Chen, Y Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated great success in various fields,
benefiting from their huge amount of parameters that store knowledge. However, LLMs still …

Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

W Chen, Z You, R Li, Y Guan, C Qian, C Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) has paved the way for the
development of highly capable autonomous agents. However, existing multi-agent …

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong… - arXiv preprint arXiv …, 2023 - simg.baai.ac.cn
In this work we systematically review the recent advancements in code processing with
language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 …

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arXiv preprint arXiv:2407.13647, 2024 - arxiv.org
When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

VarBench: Robust language model benchmarking through dynamic variable perturbation

K Qian, S Wan, C Tang, Y Wang, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
As large language models achieve impressive scores on traditional benchmarks, an
increasing number of researchers are becoming concerned about benchmark data leakage …

Beyond code generation: Assessing code llm maturity with postconditions

F He, J Zhai, M Pan - arXiv preprint arXiv:2407.14118, 2024 - arxiv.org
Most existing code Large Language Model (LLM) benchmarks, eg, EvalPlus, focus on the
code generation tasks. Namely, they contain a natural language description of a problem …

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

J Zhang, DC Juan, C Rashtchian, CS Ferng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities, but their
outputs can sometimes be unreliable or factually incorrect. To address this, we introduce …

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Z He, W Shu, X Ge, L Chen, J Wang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for
extracting sparse representations from language models, yet scalable training remains a …

Modeling future conversation turns to teach llms to ask clarifying questions

MJQ Zhang, WB Knox, E Choi - arXiv preprint arXiv:2410.13788, 2024 - arxiv.org
Large language models (LLMs) must often respond to highly ambiguous user requests. In
such cases, the LLM's best response may be to ask a clarifying question to elicit more …

Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

A Vazhentsev, E Fadeeva, R Xing… - arXiv preprint arXiv …, 2024 - arxiv.org
Uncertainty quantification (UQ) is a perspective approach to detecting Large Language
Model (LLM) hallucinations and low quality output. In this work, we address one of the …