A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Textbooks are all you need

S Gunasekar, Y Zhang, J Aneja, CCT Mendes… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce phi-1, a new large language model for code, with significantly smaller size
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Focused transformer: Contrastive training for context scaling

S Tworkowski, K Staniszewski… - Advances in …, 2024 - proceedings.neurips.cc
Large language models have an exceptional capability to incorporate new information in a
contextual manner. However, the full potential of such an approach is often restrained due to …

Textbooks are all you need ii: phi-1.5 technical report

Y Li, S Bubeck, R Eldan, A Del Giorno… - arXiv preprint arXiv …, 2023 - arxiv.org
We continue the investigation into the power of smaller Transformer-based language
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Foundation models and fair use

P Henderson, X Li, D Jurafsky, T Hashimoto… - Journal of Machine …, 2023 - jmlr.org
Existing foundation models are trained on copyrighted material. Deploying these models
can pose both legal and ethical risks when data creators fail to receive appropriate …

Llemma: An open language model for mathematics

Z Azerbayev, H Schoelkopf, K Paster… - arXiv preprint arXiv …, 2023 - arxiv.org
We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

SantaCoder: don't reach for the stars!

LB Allal, R Li, D Kocetkov, C Mou, C Akiki… - arXiv preprint arXiv …, 2023 - arxiv.org
The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …