The flan collection: Designing data and methods for effective instruction tuning
We study the design decision of publicly available instruction tuning methods, by
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …
Galactica: A large language model for science
Information overload is a major obstacle to scientific progress. The explosive growth in
scientific literature and data has made it ever harder to discover useful insights in a large …
scientific literature and data has made it ever harder to discover useful insights in a large …
Adapting large language models for education: Foundational capabilities, potentials, and challenges
Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …
provide convenient education but often fall short in real-time communication with students …
Large language monkeys: Scaling inference compute with repeated sampling
Scaling the amount of compute used to train language models has dramatically improved
their capabilities. However, when it comes to inference, we often limit the amount of compute …
their capabilities. However, when it comes to inference, we often limit the amount of compute …
Language models are greedy reasoners: A systematic formal analysis of chain-of-thought
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-
of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks …
of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks …
What algorithms can transformers learn? a study in length generalization
Large language models exhibit surprising emergent generalization properties, yet also
struggle on many simple reasoning tasks such as arithmetic and parity. This raises the …
struggle on many simple reasoning tasks such as arithmetic and parity. This raises the …
Baldur: Whole-proof generation and repair with large language models
Formally verifying software is a highly desirable but labor-intensive task. Recent work has
developed methods to automate formal verification using proof assistants, such as Coq and …
developed methods to automate formal verification using proof assistants, such as Coq and …
Draft, sketch, and prove: Guiding formal theorem provers with informal proofs
The formalization of existing mathematical proofs is a notoriously difficult process. Despite
decades of research on automation and proof assistants, writing formal proofs remains …
decades of research on automation and proof assistants, writing formal proofs remains …
Teaching algorithmic reasoning via in-context learning
Large language models (LLMs) have shown increasing in-context learning capabilities
through scaling up model and data size. Despite this progress, LLMs are still unable to solve …
through scaling up model and data size. Despite this progress, LLMs are still unable to solve …
Arb: Advanced reasoning benchmark for large language models
Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …