Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models

S Bond-Taylor, A Leach, Y Long… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Deep generative models are a class of techniques that train deep neural networks to model
the distribution of training samples. Research has fragmented into various interconnected …

High-resolution image synthesis with latent diffusion models

R Rombach, A Blattmann, D Lorenz… - Proceedings of the …, 2022 - openaccess.thecvf.com
By decomposing the image formation process into a sequential application of denoising
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …

Score-based generative modeling in latent space

A Vahdat, K Kreis, J Kautz - Advances in neural information …, 2021 - proceedings.neurips.cc
Score-based generative models (SGMs) have recently demonstrated impressive results in
terms of both sample quality and distribution coverage. However, they are usually applied …

Perception prioritized training of diffusion models

J Choi, J Lee, C Shin, S Kim, H Kim… - Proceedings of the …, 2022 - openaccess.thecvf.com
Diffusion models learn to restore noisy data, which is corrupted with different levels of noise,
by optimizing the weighted sum of the corresponding loss terms, ie, denoising score …

Vector-quantized image modeling with improved vqgan

J Yu, X Li, JY Koh, H Zhang, R Pang, J Qin… - arXiv preprint arXiv …, 2021 - arxiv.org
Pretraining language models with next-token prediction on massive text corpora has
delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities …

Score-based generative modeling with critically-damped langevin diffusion

T Dockhorn, A Vahdat, K Kreis - arXiv preprint arXiv:2112.07068, 2021 - arxiv.org
Score-based generative models (SGMs) have demonstrated remarkable synthesis quality.
SGMs rely on a diffusion process that gradually perturbs the data towards a tractable …

Wavelet diffusion models are fast and scalable image generators

H Phung, Q Dao, A Tran - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Diffusion models are rising as a powerful solution for high-fidelity image generation, which
exceeds GANs in quality in many circumstances. However, their slow training and inference …

Diff-tts: A denoising diffusion model for text-to-speech

M Jeong, H Kim, SJ Cheon, BJ Choi, NS Kim - arXiv preprint arXiv …, 2021 - arxiv.org
Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded
in generating human-like speech, there is still room for improvements to its naturalness and …

Improved transformer for high-resolution gans

L Zhao, Z Zhang, T Chen… - Advances in Neural …, 2021 - proceedings.neurips.cc
Attention-based models, exemplified by the Transformer, can effectively model long range
dependency, but suffer from the quadratic complexity of self-attention operation, making …

Controllable and compositional generation with latent-space energy-based models

W Nie, A Vahdat… - Advances in Neural …, 2021 - proceedings.neurips.cc
Controllable generation is one of the key requirements for successful adoption of deep
generative models in real-world applications, but it still remains as a great challenge. In …