Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

[HTML][HTML] Adversarial text-to-image synthesis: A review

S Frolov, T Hinz, F Raue, J Hees, A Dengel - Neural Networks, 2021 - Elsevier
With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Z Sha, Z Li, N Yu, Y Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org
Text-to-image generation models that generate images based on prompt descriptions have
attracted an increasing amount of attention during the past few months. Despite their …

Cross-modal contrastive learning for text-to-image generation

H Zhang, JY Koh, J Baldridge… - Proceedings of the …, 2021 - openaccess.thecvf.com
The output of text-to-image synthesis systems should be coherent, clear, photo-realistic
scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal …

Toward verifiable and reproducible human evaluation for text-to-image generation

M Otani, R Togashi, Y Sawai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human evaluation is critical for validating the performance of text-to-image generative
models, as this highly cognitive process requires deep comprehension of text and images …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

[HTML][HTML] Learning disentangled representations in the imaging domain

X Liu, P Sanchez, S Thermos, AQ O'Neil… - Medical Image …, 2022 - Elsevier
Disentangled representation learning has been proposed as an approach to learning
general representations even in the absence of, or with limited, supervision. A good general …

Improved vector quantized diffusion models

Z Tang, S Gu, J Bao, D Chen, F Wen - arXiv preprint arXiv:2205.16007, 2022 - arxiv.org
Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image
synthesis, but sometimes can still generate low-quality samples or weakly correlated images …

Neural architecture search with a lightweight transformer for text-to-image synthesis

W Li, S Wen, K Shi, Y Yang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of
the latest works in this field are based on the network architectures proposed by …