DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：365 相关文章所有 3 个版本

[PDF] arxiv.org

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

被引用次数：191 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2407 相关文章所有 4 个版本

[PDF] arxiv.org

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

被引用次数：1218 相关文章所有 2 个版本

[PDF] arxiv.org

Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks

W Chen, X Ma, X Wang, WW Cohen - arXiv preprint arXiv:2211.12588, 2022 - arxiv.org

Recently, there has been significant progress in teaching language models to perform step-
by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting …

被引用次数：433 相关文章所有 4 个版本

[PDF] mlr.press

Large language models can be easily distracted by irrelevant context

F Shi, X Chen, K Misra, N Scales… - International …, 2023 - proceedings.mlr.press

Large language models have achieved impressive performance on various natural
language processing tasks. However, so far they have been evaluated primarily on …

被引用次数：241 相关文章所有 6 个版本

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

被引用次数：220 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

被引用次数：999 相关文章所有 11 个版本

[PDF] researchgate.net

Least-to-most prompting enables complex reasoning in large language models

D Zhou, N Schärli, L Hou, J Wei, N Scales… - arXiv preprint arXiv …, 2022 - arxiv.org

Chain-of-thought prompting has demonstrated remarkable performance on various natural
language reasoning tasks. However, it tends to perform poorly on tasks which requires …

被引用次数：920 相关文章所有 6 个版本

[PDF] arxiv.org

Large language models can self-improve

J Huang, SS Gu, L Hou, Y Wu, X Wang, H Yu… - arXiv preprint arXiv …, 2022 - arxiv.org

Large Language Models (LLMs) have achieved excellent performances in various tasks.
However, fine-tuning an LLM requires extensive supervision. Human, on the other hand …

被引用次数：356 相关文章所有 9 个版本