Diffusion-lm improves controllable text generation
Controlling the behavior of language models (LMs) without re-training is a major open
problem in natural language generation. While recent works have demonstrated successes …
problem in natural language generation. While recent works have demonstrated successes …
Structured denoising diffusion models in discrete state-spaces
Denoising diffusion probabilistic models (DDPMs)[Ho et al. 2021] have shown impressive
results on image and waveform generation in continuous state spaces. Here, we introduce …
results on image and waveform generation in continuous state spaces. Here, we introduce …
Wavegrad: Estimating gradients for waveform generation
This paper introduces WaveGrad, a conditional model for waveform generation which
estimates gradients of the data density. The model is built on prior work on score matching …
estimates gradients of the data density. The model is built on prior work on score matching …
A survey on non-autoregressive generation for neural machine translation and beyond
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …
(NMT) to speed up inference, has attracted much attention in both machine learning and …
Diffusionbert: Improving generative masked language models with diffusion models
We present DiffusionBERT, a new generative masked language model based on discrete
diffusion models. Diffusion models and many pre-trained language models have a shared …
diffusion models. Diffusion models and many pre-trained language models have a shared …
Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation
Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive
translation (NAT) models, which eases the training of NAT models at the cost of losing …
translation (NAT) models, which eases the training of NAT models at the cost of losing …
Glancing transformer for non-autoregressive neural machine translation
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the
efficiency by parallel decoding without sacrificing the quality. However, existing NAT …
efficiency by parallel decoding without sacrificing the quality. However, existing NAT …
Learning to efficiently sample from diffusion probabilistic models
Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful family of
generative models that can yield high-fidelity samples and competitive log-likelihoods …
generative models that can yield high-fidelity samples and competitive log-likelihoods …
SmBoP: Semi-autoregressive bottom-up semantic parsing
The de-facto standard decoding method for semantic parsing in recent years has been to
autoregressively decode the abstract syntax tree of the target program using a top-down …
autoregressively decode the abstract syntax tree of the target program using a top-down …
Step-unrolled denoising autoencoders for text generation
In this paper we propose a new generative model of text, Step-unrolled Denoising
Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising …
Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising …