A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Textbooks are all you need
We introduce phi-1, a new large language model for code, with significantly smaller size
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
Scaling data-constrained language models
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
Focused transformer: Contrastive training for context scaling
S Tworkowski, K Staniszewski… - Advances in …, 2024 - proceedings.neurips.cc
Large language models have an exceptional capability to incorporate new information in a
contextual manner. However, the full potential of such an approach is often restrained due to …
contextual manner. However, the full potential of such an approach is often restrained due to …
Textbooks are all you need ii: phi-1.5 technical report
We continue the investigation into the power of smaller Transformer-based language
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
Foundation models and fair use
Existing foundation models are trained on copyrighted material. Deploying these models
can pose both legal and ethical risks when data creators fail to receive appropriate …
can pose both legal and ethical risks when data creators fail to receive appropriate …
Llemma: An open language model for mathematics
We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …
SantaCoder: don't reach for the stars!
The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …
development of large language models for code. This tech report describes the progress of …