Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Deepfakes generation and detection: a short survey

Z Akhtar - Journal of Imaging, 2023 - mdpi.com
Advancements in deep learning techniques and the availability of free, large databases
have made it possible, even for non-technical people, to either manipulate or generate …

Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing

M Cao, X Wang, Z Qi, Y Shan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite the success in large-scale text-to-image generation and text-conditioned image
editing, existing methods still struggle to produce consistent generation and editing results …

Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation

Y Wei, Y Zhang, Z Ji, J Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
In addition to the unprecedented ability in imaginary creation, large text-to-image models are
expected to take customized concepts in image generation. Existing works generally learn …

Paint by example: Exemplar-based image editing with diffusion models

B Yang, S Gu, B Zhang, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Language-guided image editing has achieved great success recently. In this paper,
we investigate exemplar-guided image editing for more precise control. We achieve this …

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

N Ruiz, Y Li, V Jampani, Y Pritch… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-
quality and diverse synthesis of images from a given text prompt. However, these models …

Prompt-to-prompt image editing with cross attention control

A Hertz, R Mokady, J Tenenbaum, K Aberman… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent large-scale text-driven synthesis models have attracted much attention thanks to
their remarkable capabilities of generating highly diverse images that follow given text …

Diffusionclip: Text-guided diffusion models for robust image manipulation

G Kim, T Kwon, JC Ye - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …

Clip-nerf: Text-and-image driven manipulation of neural radiance fields

C Wang, M Chai, M He, D Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance
fields (NeRF). By leveraging the joint language-image embedding space of the recent …

Collaborative diffusion for multi-modal face generation and editing

Z Huang, KCK Chan, Y Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion models arise as a powerful generative tool recently. Despite the great progress,
existing diffusion models mainly focus on uni-modal control, ie, the diffusion process is …