Dual adversarial inference for text-to-image synthesis

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：204 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] Adversarial text-to-image synthesis: A review

S Frolov, T Hinz, F Raue, J Hees, A Dengel - Neural Networks, 2021 - Elsevier

With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …

被引用次数：198 相关文章所有 9 个版本

[PDF] thecvf.com

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

被引用次数：738 相关文章所有 10 个版本

[PDF] arxiv.org

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Z Sha, Z Li, N Yu, Y Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org

Text-to-image generation models that generate images based on prompt descriptions have
attracted an increasing amount of attention during the past few months. Despite their …

被引用次数：124 相关文章所有 6 个版本

[PDF] thecvf.com

Cross-modal contrastive learning for text-to-image generation

H Zhang, JY Koh, J Baldridge… - Proceedings of the …, 2021 - openaccess.thecvf.com

The output of text-to-image synthesis systems should be coherent, clear, photo-realistic
scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal …

被引用次数：373 相关文章所有 6 个版本

[PDF] thecvf.com

Toward verifiable and reproducible human evaluation for text-to-image generation

M Otani, R Togashi, Y Sawai… - Proceedings of the …, 2023 - openaccess.thecvf.com

Human evaluation is critical for validating the performance of text-to-image generative
models, as this highly cognitive process requires deep comprehension of text and images …

被引用次数：64 相关文章所有 7 个版本

[PDF] arxiv.org

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

被引用次数：387 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Learning disentangled representations in the imaging domain

X Liu, P Sanchez, S Thermos, AQ O'Neil… - Medical Image …, 2022 - Elsevier

Disentangled representation learning has been proposed as an approach to learning
general representations even in the absence of, or with limited, supervision. A good general …

被引用次数：90 相关文章所有 6 个版本

[PDF] arxiv.org

Improved vector quantized diffusion models

Z Tang, S Gu, J Bao, D Chen, F Wen - arXiv preprint arXiv:2205.16007, 2022 - arxiv.org

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image
synthesis, but sometimes can still generate low-quality samples or weakly correlated images …

被引用次数：62 相关文章所有 2 个版本

Neural architecture search with a lightweight transformer for text-to-image synthesis

W Li, S Wen, K Shi, Y Yang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of
the latest works in this field are based on the network architectures proposed by …

被引用次数：53 相关文章所有 2 个版本