Solving quantitative reasoning problems with language models, 2022

S Longpre, L Hou, T Vu, A Webson… - International …, 2023 - proceedings.mlr.press

We study the design decision of publicly available instruction tuning methods, by
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …

被引用次数：630 相关文章所有 8 个版本

[PDF] arxiv.org

Galactica: A large language model for science

R Taylor, M Kardas, G Cucurull, T Scialom… - arXiv preprint arXiv …, 2022 - arxiv.org

Information overload is a major obstacle to scientific progress. The explosive growth in
scientific literature and data has made it ever harder to discover useful insights in a large …

被引用次数：706 相关文章所有 4 个版本

[PDF] arxiv.org

Adapting large language models for education: Foundational capabilities, potentials, and challenges

Q Li, L Fu, W Zhang, X Chen, J Yu, W Xia… - arXiv preprint arXiv …, 2023 - arxiv.org

Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Large language monkeys: Scaling inference compute with repeated sampling

B Brown, J Juravsky, R Ehrlich, R Clark, QV Le… - arXiv preprint arXiv …, 2024 - arxiv.org

Scaling the amount of compute used to train language models has dramatically improved
their capabilities. However, when it comes to inference, we often limit the amount of compute …

被引用次数：76 相关文章所有 3 个版本

[PDF] arxiv.org

Language models are greedy reasoners: A systematic formal analysis of chain-of-thought

A Saparov, H He - arXiv preprint arXiv:2210.01240, 2022 - arxiv.org

Large language models (LLMs) have shown remarkable reasoning capabilities given chain-
of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks …

被引用次数：210 相关文章所有 3 个版本

[PDF] arxiv.org

What algorithms can transformers learn? a study in length generalization

H Zhou, A Bradley, E Littwin, N Razin, O Saremi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models exhibit surprising emergent generalization properties, yet also
struggle on many simple reasoning tasks such as arithmetic and parity. This raises the …

被引用次数：93 相关文章所有 6 个版本

[PDF] arxiv.org

Baldur: Whole-proof generation and repair with large language models

E First, MN Rabe, T Ringer, Y Brun - Proceedings of the 31st ACM Joint …, 2023 - dl.acm.org

Formally verifying software is a highly desirable but labor-intensive task. Recent work has
developed methods to automate formal verification using proof assistants, such as Coq and …

被引用次数：110 相关文章所有 7 个版本

[PDF] arxiv.org

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs

AQ Jiang, S Welleck, JP Zhou, W Li, J Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

The formalization of existing mathematical proofs is a notoriously difficult process. Despite
decades of research on automation and proof assistants, writing formal proofs remains …

被引用次数：127 相关文章所有 5 个版本

[PDF] arxiv.org

Teaching algorithmic reasoning via in-context learning

H Zhou, A Nova, H Larochelle, A Courville… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LLMs) have shown increasing in-context learning capabilities
through scaling up model and data size. Despite this progress, LLMs are still unable to solve …

被引用次数：71 相关文章所有 4 个版本

[PDF] arxiv.org

Arb: Advanced reasoning benchmark for large language models

T Sawada, D Paleka, A Havrilla, P Tadepalli… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …

被引用次数：51 相关文章所有 6 个版本