Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, Q Al Tashi, A Shah, R Qureshi… - Authorea …, 2024 - authorea.com
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc
We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

Fast inference from transformers via speculative decoding

Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press
Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …

Structured denoising diffusion models in discrete state-spaces

J Austin, DD Johnson, J Ho, D Tarlow… - Advances in …, 2021 - proceedings.neurips.cc
Denoising diffusion probabilistic models (DDPMs)[Ho et al. 2021] have shown impressive
results on image and waveform generation in continuous state spaces. Here, we introduce …

Madlad-400: A multilingual and document-level large audited dataset

S Kudugunta, I Caswell, B Zhang… - Advances in …, 2024 - proceedings.neurips.cc
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

Don't stop pretraining: Adapt language models to domains and tasks

S Gururangan, A Marasović, S Swayamdipta… - arXiv preprint arXiv …, 2020 - arxiv.org
Language models pretrained on text from a wide variety of sources form the foundation of
today's NLP. In light of the success of these broad-coverage models, we investigate whether …

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …