Have you seen that number? investigating extrapolation in question answering models

N Lee, K Sreenivasan, JD Lee, K Lee… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models like GPT-4 exhibit emergent capabilities across general-purpose
tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks …

被引用次数：60 相关文章所有 5 个版本

[PDF] arxiv.org

What algorithms can transformers learn? a study in length generalization

H Zhou, A Bradley, E Littwin, N Razin, O Saremi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models exhibit surprising emergent generalization properties, yet also
struggle on many simple reasoning tasks such as arithmetic and parity. This raises the …

被引用次数：63 相关文章所有 6 个版本

[PDF] arxiv.org

Teaching algorithmic reasoning via in-context learning

H Zhou, A Nova, H Larochelle, A Courville… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LLMs) have shown increasing in-context learning capabilities
through scaling up model and data size. Despite this progress, LLMs are still unable to solve …

被引用次数：60 相关文章所有 4 个版本

[PDF] mdpi.com

Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models

A Testolin - Applied Sciences, 2024 - mdpi.com

Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Goat: Fine-tuned llama outperforms gpt-4 on arithmetic tasks

T Liu, BKH Low - arXiv preprint arXiv:2305.14201, 2023 - arxiv.org

We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a
range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves …

被引用次数：57 相关文章所有 2 个版本

[PDF] arxiv.org

Identifying weaknesses in machine translation metrics through minimum Bayes risk decoding: A case study for COMET

C Amrhein, R Sennrich - arXiv preprint arXiv:2202.05148, 2022 - arxiv.org

Neural metrics have achieved impressive correlation with human judgements in the
evaluation of machine translation systems, but before we can safely optimise towards such …

被引用次数：44 相关文章所有 5 个版本

[PDF] sagepub.com Full View

Large language models: a primer and gastroenterology applications

O Shahab, B El Kurdi, A Shaukat… - Therapeutic …, 2024 - journals.sagepub.com

Over the past year, the emergence of state-of-the-art large language models (LLMs) in tools
like ChatGPT has ushered in a rapid acceleration in artificial intelligence (AI) innovation …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data

M Akhtar, A Shankarampeta, V Gupta, A Patil… - arXiv preprint arXiv …, 2023 - arxiv.org

Numbers are crucial for various real-world domains such as finance, economics, and
science. Thus, understanding and reasoning with numbers are essential skills for language …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Induced natural language rationales and interleaved markup tokens enable extrapolation in large language models

M Bueno, C Gemmell, J Dalton, R Lotufo… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to extrapolate, ie, to make predictions on sequences that are longer than those
presented as training examples, is a challenging problem for current deep learning models …

被引用次数：13 相关文章所有 4 个版本

[PDF] medrxiv.org

Assessing GPT-3.5 and GPT-4 in generating international classification of diseases billing codes

A Soroush, BS Glicksberg, E Zimlichman, Y Barash… - medRxiv, 2023 - medrxiv.org

Background Large Language Models (LLMs) like GPT-3.5 and GPT-4 are increasingly
entering the healthcare domain as a proposed means to assist with administrative tasks. To …

被引用次数：5 相关文章所有 2 个版本