The flan collection: Designing data and methods for effective instruction tuning

S Longpre, L Hou, T Vu, A Webson… - International …, 2023 - proceedings.mlr.press
We study the design decision of publicly available instruction tuning methods, by
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …

Galactica: A large language model for science

R Taylor, M Kardas, G Cucurull, T Scialom… - arXiv preprint arXiv …, 2022 - arxiv.org
Information overload is a major obstacle to scientific progress. The explosive growth in
scientific literature and data has made it ever harder to discover useful insights in a large …

Adapting large language models for education: Foundational capabilities, potentials, and challenges

Q Li, L Fu, W Zhang, X Chen, J Yu, W Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …

Large language monkeys: Scaling inference compute with repeated sampling

B Brown, J Juravsky, R Ehrlich, R Clark, QV Le… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling the amount of compute used to train language models has dramatically improved
their capabilities. However, when it comes to inference, we often limit the amount of compute …

Language models are greedy reasoners: A systematic formal analysis of chain-of-thought

A Saparov, H He - arXiv preprint arXiv:2210.01240, 2022 - arxiv.org
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-
of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks …

What algorithms can transformers learn? a study in length generalization

H Zhou, A Bradley, E Littwin, N Razin, O Saremi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models exhibit surprising emergent generalization properties, yet also
struggle on many simple reasoning tasks such as arithmetic and parity. This raises the …

Baldur: Whole-proof generation and repair with large language models

E First, MN Rabe, T Ringer, Y Brun - Proceedings of the 31st ACM Joint …, 2023 - dl.acm.org
Formally verifying software is a highly desirable but labor-intensive task. Recent work has
developed methods to automate formal verification using proof assistants, such as Coq and …

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs

AQ Jiang, S Welleck, JP Zhou, W Li, J Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
The formalization of existing mathematical proofs is a notoriously difficult process. Despite
decades of research on automation and proof assistants, writing formal proofs remains …

Teaching algorithmic reasoning via in-context learning

H Zhou, A Nova, H Larochelle, A Courville… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LLMs) have shown increasing in-context learning capabilities
through scaling up model and data size. Despite this progress, LLMs are still unable to solve …

Arb: Advanced reasoning benchmark for large language models

T Sawada, D Paleka, A Havrilla, P Tadepalli… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …