Masked diffusion transformer is a strong image synthesizer

S Gao, P Zhou, MM Cheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …

Making llama see and draw with seed tokenizer

Y Ge, S Zhao, Z Zeng, Y Ge, C Li, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
The great success of Large Language Models (LLMs) has expanded the potential of
multimodality, contributing to the gradual evolution of General Artificial Intelligence (AGI). A …

Online clustered codebook

C Zheng, A Vedaldi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is
increasingly used in representation learning. However, optimizing the codevectors in …

Show-o: One single transformer to unify multimodal understanding and generation

J Xie, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …

Extreme image compression using fine-tuned vqgans

Q Mao, T Yang, Y Zhang, Z Wang… - 2024 Data …, 2024 - ieeexplore.ieee.org
Recent advances in generative compression methods have demonstrated remarkable
progress in enhancing the perceptual quality of compressed data, especially in scenarios …

All-in-one simulation-based inference

M Gloeckler, M Deistler, C Weilbach, F Wood… - arXiv preprint arXiv …, 2024 - arxiv.org
Amortized Bayesian inference trains neural networks to solve stochastic inference problems
using model simulations, thereby making it possible to rapidly perform Bayesian inference …

LG-VQ: Language-Guided Codebook Learning

G Liang, B Zhang, Y Wang, X Li, Y Ye, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

S Gao, P Zhou, MM Cheng, S Yan - arXiv preprint arXiv:2303.14389, 2023 - arxiv.org
Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …