Taskbench: Benchmarking large language models for task automation

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

被引用次数：132 相关文章所有 5 个版本

[PDF] arxiv.org

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Easytool: Enhancing llm-based agents with concise tool instruction

S Yuan, K Song, J Chen, X Tan, Y Shen, R Kan… - arXiv preprint arXiv …, 2024 - arxiv.org

To address intricate real-world tasks, there has been a rising interest in tool utilization in
applications of large language models (LLMs). To develop LLM-based agents, it usually …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

What are tools anyway? a survey from the language model perspective

Z Wang, Z Cheng, H Zhu, D Fried, G Neubig - arXiv preprint arXiv …, 2024 - arxiv.org

Language models (LMs) are powerful yet mostly for text generation tasks. Tools have
substantially enhanced their performance for tasks that require complex skills. However …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

From persona to personalization: A survey on role-playing language agents

J Chen, X Wang, R Xu, S Yuan, Y Zhang, W Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have significantly boosted the rise
of Role-Playing Language Agents (RPLAs), ie, specialized AI systems designed to simulate …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

S Yang, B Zhao, C Xie - arXiv preprint arXiv:2402.09404, 2024 - arxiv.org

This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning
capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Mindsearch: Mimicking human minds elicits deep ai searcher

Z Chen, K Liu, Q Wang, J Liu, W Zhang, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Information seeking and integration is a complex cognitive task that consumes enormous
time and effort. Inspired by the remarkable progress of Large Language Models, recent …

被引用次数：2 相关文章所有 2 个版本

[PDF] openreview.net

Thought-retriever: Don't just retrieve raw data, retrieve thoughts

T Feng, P Han, G Lin, G Liu, J You - … : How Far Are We From AGI, 2024 - openreview.net

Large language models (LLMs) have transformed AI research thanks to their powerful
internal capabilities and knowledge. However, existing LLMs still fail to effectively …

被引用次数：3 相关文章

[PDF] openreview.net

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Z Ma, W Huang, J Zhang, T Gupta… - Synthetic Data for …, 2024 - openreview.net

Real-world multi-modal problems are rarely solved by a single machine learning model, and
often require multi-step computational plans that involve stitching several models. Tool …

被引用次数：5 相关文章所有 3 个版本