Charline Le Lan, Christopher A

S Wu, Y Xiong, Y Cui, H Wu, C Chen, Y Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated great success in various fields,
benefiting from their huge amount of parameters that store knowledge. However, LLMs still …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

W Chen, Z You, R Li, Y Guan, C Qian, C Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancement of large language models (LLMs) has paved the way for the
development of highly capable autonomous agents. However, existing multi-agent …

被引用次数：22 相关文章所有 3 个版本

[PDF] baai.ac.cn

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong… - arXiv preprint arXiv …, 2023 - simg.baai.ac.cn

In this work we systematically review the recent advancements in code processing with
language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 …

被引用次数：55 相关文章所有 4 个版本

[PDF] arxiv.org

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arXiv preprint arXiv:2407.13647, 2024 - arxiv.org

When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

VarBench: Robust language model benchmarking through dynamic variable perturbation

K Qian, S Wan, C Tang, Y Wang, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

As large language models achieve impressive scores on traditional benchmarks, an
increasing number of researchers are becoming concerned about benchmark data leakage …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Beyond code generation: Assessing code llm maturity with postconditions

F He, J Zhai, M Pan - arXiv preprint arXiv:2407.14118, 2024 - arxiv.org

Most existing code Large Language Model (LLM) benchmarks, eg, EvalPlus, focus on the
code generation tasks. Namely, they contain a natural language description of a problem …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

J Zhang, DC Juan, C Rashtchian, CS Ferng… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated remarkable capabilities, but their
outputs can sometimes be unreliable or factually incorrect. To address this, we introduce …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Z He, W Shu, X Ge, L Chen, J Wang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for
extracting sparse representations from language models, yet scalable training remains a …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Modeling future conversation turns to teach llms to ask clarifying questions

MJQ Zhang, WB Knox, E Choi - arXiv preprint arXiv:2410.13788, 2024 - arxiv.org

Large language models (LLMs) must often respond to highly ambiguous user requests. In
such cases, the LLM's best response may be to ask a clarifying question to elicit more …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

A Vazhentsev, E Fadeeva, R Xing… - arXiv preprint arXiv …, 2024 - arxiv.org

Uncertainty quantification (UQ) is a perspective approach to detecting Large Language
Model (LLM) hallucinations and low quality output. In this work, we address one of the …

被引用次数：1 相关文章所有 2 个版本

Retrieval-augmented generation for natural language processing: A survey

Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Weak-to-strong reasoning

VarBench: Robust language model benchmarking through dynamic variable perturbation

Beyond code generation: Assessing code llm maturity with postconditions

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Modeling future conversation turns to teach llms to ask clarifying questions

Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

高级搜索

引用