Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Deepfakes generation and detection: a short survey
Z Akhtar - Journal of Imaging, 2023 - mdpi.com
Advancements in deep learning techniques and the availability of free, large databases
have made it possible, even for non-technical people, to either manipulate or generate …
have made it possible, even for non-technical people, to either manipulate or generate …
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing
Despite the success in large-scale text-to-image generation and text-conditioned image
editing, existing methods still struggle to produce consistent generation and editing results …
editing, existing methods still struggle to produce consistent generation and editing results …
Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation
In addition to the unprecedented ability in imaginary creation, large text-to-image models are
expected to take customized concepts in image generation. Existing works generally learn …
expected to take customized concepts in image generation. Existing works generally learn …
Paint by example: Exemplar-based image editing with diffusion models
Abstract Language-guided image editing has achieved great success recently. In this paper,
we investigate exemplar-guided image editing for more precise control. We achieve this …
we investigate exemplar-guided image editing for more precise control. We achieve this …
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-
quality and diverse synthesis of images from a given text prompt. However, these models …
quality and diverse synthesis of images from a given text prompt. However, these models …
Prompt-to-prompt image editing with cross attention control
Recent large-scale text-driven synthesis models have attracted much attention thanks to
their remarkable capabilities of generating highly diverse images that follow given text …
their remarkable capabilities of generating highly diverse images that follow given text …
Diffusionclip: Text-guided diffusion models for robust image manipulation
Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …
Clip-nerf: Text-and-image driven manipulation of neural radiance fields
We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance
fields (NeRF). By leveraging the joint language-image embedding space of the recent …
fields (NeRF). By leveraging the joint language-image embedding space of the recent …
Collaborative diffusion for multi-modal face generation and editing
Diffusion models arise as a powerful generative tool recently. Despite the great progress,
existing diffusion models mainly focus on uni-modal control, ie, the diffusion process is …
existing diffusion models mainly focus on uni-modal control, ie, the diffusion process is …