One billion word benchmark for measuring progress in statistical language modeling

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org

Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

被引用次数：1467 相关文章所有 6 个版本

[PDF] authorea.com

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, Q Al Tashi, A Shah, R Qureshi… - Authorea …, 2024 - authorea.com

Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

被引用次数：218 相关文章所有 5 个版本

[PDF] neurips.cc

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc

We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

被引用次数：417 相关文章所有 7 个版本

[PDF] mlr.press

Fast inference from transformers via speculative decoding

Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press

Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …

被引用次数：400 相关文章所有 7 个版本

[PDF] neurips.cc

Structured denoising diffusion models in discrete state-spaces

J Austin, DD Johnson, J Ho, D Tarlow… - Advances in …, 2021 - proceedings.neurips.cc

Denoising diffusion probabilistic models (DDPMs)[Ho et al. 2021] have shown impressive
results on image and waveform generation in continuous state spaces. Here, we introduce …

被引用次数：783 相关文章所有 9 个版本

[PDF] neurips.cc

Madlad-400: A multilingual and document-level large audited dataset

S Kudugunta, I Caswell, B Zhang… - Advances in …, 2024 - proceedings.neurips.cc

We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org

We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

被引用次数：1728 相关文章所有 8 个版本

[PDF] ieee.org

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

被引用次数：1647 相关文章所有 17 个版本

[PDF] aclanthology.org

Don't stop pretraining: Adapt language models to domains and tasks

S Gururangan, A Marasović, S Swayamdipta… - arXiv preprint arXiv …, 2020 - arxiv.org

Language models pretrained on text from a wide variety of sources form the foundation of
today's NLP. In light of the success of these broad-coverage models, we investigate whether …

被引用次数：2463 相关文章所有 10 个版本

[PDF] neurips.cc

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …

被引用次数：774 相关文章所有 7 个版本