Diffusion models: A comprehensive survey of methods and applications
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …
record-breaking performance in many applications, including image synthesis, video …
Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …
Symbolic discovery of optimization algorithms
We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …
discover optimization algorithms for deep neural network training. We leverage efficient …
Fast inference from transformers via speculative decoding
Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press
Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …
takes K serial runs of the model. In this work we introduce speculative decoding-an …
Structured denoising diffusion models in discrete state-spaces
Denoising diffusion probabilistic models (DDPMs)[Ho et al. 2021] have shown impressive
results on image and waveform generation in continuous state spaces. Here, we introduce …
results on image and waveform generation in continuous state spaces. Here, we introduce …
Madlad-400: A multilingual and document-level large audited dataset
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …
Rethinking attention with performers
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …
Prottrans: Toward understanding the language of life through self-supervised learning
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …
Don't stop pretraining: Adapt language models to domains and tasks
Language models pretrained on text from a wide variety of sources form the foundation of
today's NLP. In light of the success of these broad-coverage models, we investigate whether …
today's NLP. In light of the success of these broad-coverage models, we investigate whether …
Morel: Model-based offline reinforcement learning
R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …
solely on a dataset of historical interactions with the environment. This serves as an extreme …