Masked diffusion transformer is a strong image synthesizer
Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …
Making llama see and draw with seed tokenizer
The great success of Large Language Models (LLMs) has expanded the potential of
multimodality, contributing to the gradual evolution of General Artificial Intelligence (AGI). A …
multimodality, contributing to the gradual evolution of General Artificial Intelligence (AGI). A …
Online clustered codebook
Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is
increasingly used in representation learning. However, optimizing the codevectors in …
increasingly used in representation learning. However, optimizing the codevectors in …
Show-o: One single transformer to unify multimodal understanding and generation
We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …
Extreme image compression using fine-tuned vqgans
Recent advances in generative compression methods have demonstrated remarkable
progress in enhancing the perceptual quality of compressed data, especially in scenarios …
progress in enhancing the perceptual quality of compressed data, especially in scenarios …
All-in-one simulation-based inference
Amortized Bayesian inference trains neural networks to solve stochastic inference problems
using model simulations, thereby making it possible to rapidly perform Bayesian inference …
using model simulations, thereby making it possible to rapidly perform Bayesian inference …
LG-VQ: Language-Guided Codebook Learning
Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …