Rethinking the objectives of vector-quantized tokenizers for image synthesis

S Gao, P Zhou, MM Cheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …

被引用次数：74 相关文章所有 4 个版本

[PDF] arxiv.org

Making llama see and draw with seed tokenizer

Y Ge, S Zhao, Z Zeng, Y Ge, C Li, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

The great success of Large Language Models (LLMs) has expanded the potential of
multimodality, contributing to the gradual evolution of General Artificial Intelligence (AGI). A …

被引用次数：43 相关文章所有 3 个版本

[PDF] thecvf.com

Online clustered codebook

C Zheng, A Vedaldi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is
increasingly used in representation learning. However, optimizing the codevectors in …

被引用次数：21 相关文章所有 11 个版本

[PDF] arxiv.org

Show-o: One single transformer to unify multimodal understanding and generation

J Xie, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Extreme image compression using fine-tuned vqgans

Q Mao, T Yang, Y Zhang, Z Wang… - 2024 Data …, 2024 - ieeexplore.ieee.org

Recent advances in generative compression methods have demonstrated remarkable
progress in enhancing the perceptual quality of compressed data, especially in scenarios …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

All-in-one simulation-based inference

M Gloeckler, M Deistler, C Weilbach, F Wood… - arXiv preprint arXiv …, 2024 - arxiv.org

Amortized Bayesian inference trains neural networks to solve stochastic inference problems
using model simulations, thereby making it possible to rapidly perform Bayesian inference …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

LG-VQ: Language-Guided Codebook Learning

G Liang, B Zhang, Y Wang, X Li, Y Ye, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

S Gao, P Zhou, MM Cheng, S Yan - arXiv preprint arXiv:2303.14389, 2023 - arxiv.org

Despite its success in image synthesis, we observe that diffusion probabilistic models
(DPMs) often lack contextual reasoning ability to learn the relations among object parts in an …

被引用次数：3 相关文章所有 2 个版本