Videocrafter2: Overcoming data limitations for high-quality video diffusion models

H Chen, Y Zhang, X Cun, M Xia… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

Dynamicrafter: Animating open-domain images with video diffusion priors

J Xing, M Xia, Y Zhang, H Chen, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …

Ssr-encoder: Encoding selective subject representation for subject-driven generation

Y Zhang, Y Song, J Liu, R Wang, J Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in subject-driven image generation have led to zero-shot generation
yet precise selection and focus on crucial subject representations remain challenging …

Proxedit: Improving tuning-free real image editing with proximal guidance

L Han, S Wen, Q Chen, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
DDIM inversion has revealed the remarkable potential of real image editing within diffusion-
based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier …

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

S Yoo, K Kim, VG Kim, M Sung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract We present As-Plausible-as-Possible (APAP) mesh deformation technique that
leverages 2D diffusion priors to preserve the plausibility of a mesh under user-controlled …

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Y Zhang, S Qian, B Peng, S Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This study targets a critical aspect of multi-modal LLMs'(LLMs&VLMs) inference: explicit
controllable text generation. Multi-modal LLMs empower multi-modality understanding with …

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

P Marcos-Manchón, R Alcover-Couso… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating
high-quality images from text prompts models such as Stable Diffusion have been …

Accelerating diffusion models for inverse problems through shortcut sampling

G Liu, H Sun, J Li, F Yin, Y Yang - arXiv preprint arXiv:2305.16965, 2023 - arxiv.org
Diffusion models have recently demonstrated an impressive ability to address inverse
problems in an unsupervised manner. While existing methods primarily focus on modifying …

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

D Yan, L Yuan, Y Nishioka, I Fujishiro… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, diffusion models have demonstrated their effectiveness in generating extremely
high-quality images and have found wide-ranging applications, including automatic sketch …

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

T Wu, Y Zhang, X Wang, X Zhou, G Zheng, Z Qi… - arXiv preprint arXiv …, 2024 - arxiv.org
Customized video generation aims to generate high-quality videos guided by text prompts
and subject's reference images. However, since it is only trained on static images, the fine …