The stack: 3 tb of permissively licensed source code

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：365 相关文章所有 3 个版本

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：337 相关文章所有 3 个版本

[PDF] openreview.net

Textbooks are all you need

S Gunasekar, Y Zhang, J Aneja, CCT Mendes… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce phi-1, a new large language model for code, with significantly smaller size
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …

被引用次数：365 相关文章所有 4 个版本

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：150 相关文章所有 7 个版本

[PDF] neurips.cc

Focused transformer: Contrastive training for context scaling

S Tworkowski, K Staniszewski… - Advances in …, 2024 - proceedings.neurips.cc

Large language models have an exceptional capability to incorporate new information in a
contextual manner. However, the full potential of such an approach is often restrained due to …

被引用次数：72 相关文章所有 6 个版本

[PDF] arxiv.org

Textbooks are all you need ii: phi-1.5 technical report

Y Li, S Bubeck, R Eldan, A Del Giorno… - arXiv preprint arXiv …, 2023 - arxiv.org

We continue the investigation into the power of smaller Transformer-based language
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …

被引用次数：261 相关文章所有 2 个版本

[PDF] thecvf.com

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

被引用次数：55 相关文章所有 3 个版本

[PDF] jmlr.org

Foundation models and fair use

P Henderson, X Li, D Jurafsky, T Hashimoto… - Journal of Machine …, 2023 - jmlr.org

Existing foundation models are trained on copyrighted material. Deploying these models
can pose both legal and ethical risks when data creators fail to receive appropriate …

被引用次数：105 相关文章所有 4 个版本

[PDF] arxiv.org

Llemma: An open language model for mathematics

Z Azerbayev, H Schoelkopf, K Paster… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

被引用次数：151 相关文章所有 7 个版本

[PDF] arxiv.org

SantaCoder: don't reach for the stars!

LB Allal, R Li, D Kocetkov, C Mou, C Akiki… - arXiv preprint arXiv …, 2023 - arxiv.org

The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …

被引用次数：175 相关文章所有 8 个版本