Real-world image variation by aligning diffusion inversion chain

H Chen, Y Zhang, X Cun, M Xia… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

被引用次数：62 相关文章所有 3 个版本

[PDF] arxiv.org

Dynamicrafter: Animating open-domain images with video diffusion priors

J Xing, M Xia, Y Zhang, H Chen, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …

被引用次数：63 相关文章所有 2 个版本

[PDF] thecvf.com

Ssr-encoder: Encoding selective subject representation for subject-driven generation

Y Zhang, Y Song, J Liu, R Wang, J Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in subject-driven image generation have led to zero-shot generation
yet precise selection and focus on crucial subject representations remain challenging …

被引用次数：14 相关文章所有 3 个版本

[PDF] thecvf.com

Proxedit: Improving tuning-free real image editing with proximal guidance

L Han, S Wen, Q Chen, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

DDIM inversion has revealed the remarkable potential of real image editing within diffusion-
based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier …

被引用次数：13 相关文章所有 4 个版本

[PDF] thecvf.com

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

S Yoo, K Kim, VG Kim, M Sung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract We present As-Plausible-as-Possible (APAP) mesh deformation technique that
leverages 2D diffusion priors to preserve the plausibility of a mesh under user-controlled …

被引用次数：3 相关文章所有 5 个版本

[PDF] thecvf.com

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Y Zhang, S Qian, B Peng, S Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

This study targets a critical aspect of multi-modal LLMs'(LLMs&VLMs) inference: explicit
controllable text generation. Multi-modal LLMs empower multi-modality understanding with …

被引用次数：6 相关文章所有 4 个版本

[PDF] thecvf.com

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

P Marcos-Manchón, R Alcover-Couso… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models represent a new paradigm in text-to-image generation. Beyond generating
high-quality images from text prompts models such as Stable Diffusion have been …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Accelerating diffusion models for inverse problems through shortcut sampling

G Liu, H Sun, J Li, F Yin, Y Yang - arXiv preprint arXiv:2305.16965, 2023 - arxiv.org

Diffusion models have recently demonstrated an impressive ability to address inverse
problems in an unsupervised manner. While existing methods primarily focus on modifying …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

D Yan, L Yuan, Y Nishioka, I Fujishiro… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, diffusion models have demonstrated their effectiveness in generating extremely
high-quality images and have found wide-ranging applications, including automatic sketch …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

T Wu, Y Zhang, X Wang, X Zhou, G Zheng, Z Qi… - arXiv preprint arXiv …, 2024 - arxiv.org

Customized video generation aims to generate high-quality videos guided by text prompts
and subject's reference images. However, since it is only trained on static images, the fine …