Zone: Zero-shot instruction-guided local editing

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

被引用次数：61 相关文章所有 2 个版本

[PDF] arxiv.org

Mm-soc: Benchmarking multimodal large language models in social media platforms

Y Jin, M Choi, G Verma, J Wang, S Kumar - arXiv preprint arXiv …, 2024 - arxiv.org

Social media platforms are hubs for multimodal information exchange, encompassing text,
images, and videos, making it challenging for machines to comprehend the information or …

被引用次数：19 相关文章所有 2 个版本

[PDF] pkwyx.com

Wave: Warping ddim inversion features for zero-shot text-to-video editing

Y Feng, S Gao, Y Bao, X Wang, S Han, J Zhang… - … on Computer Vision, 2025 - Springer

Text-driven video editing has emerged as a prominent application based on the
breakthroughs of image diffusion models. Existing state-of-the-art methods focus on zero …

被引用次数：2 相关文章所有 4 个版本

[PDF] acm.org

HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation

AB Anees, AC Baykal, MB Kizil, D Ceylan… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

Generative Adversarial Networks (GANs), particularly StyleGAN and its variants, have
demonstrated remarkable capabilities in generating highly realistic images. Despite their …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation

H Li, Y Feng, S Xue, X Liu, B Zeng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D face reconstruction aims at generating high-fidelity 3D face shapes and textures
from single-view or multi-view images. However current prevailing facial texture generation …

被引用次数：6 相关文章

[PDF] arxiv.org

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

X Shuai, H Ding, X Ma, R Tu, YG Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Image editing aims to edit the given synthetic or real image to meet the specific requirements
from users. It is widely studied in recent years as a promising and challenging field of …

被引用次数：13 相关文章

[PDF] arxiv.org

Stable-hair: Real-world hair transfer via diffusion model

Y Zhang, Q Zhang, Y Song, J Liu - arXiv preprint arXiv:2407.14078, 2024 - arxiv.org

Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting
their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis

B Zeng, L Yang, S Li, J Liu, Z Zhang, J Tian… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advances in diffusion models have demonstrated exceptional capabilities in image
and video generation, further improving the effectiveness of 4D synthesis. Existing 4D …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Taming Rectified Flow for Inversion and Editing

J Wang, J Pu, Z Qi, J Guo, Y Ma, N Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have
demonstrated exceptional performance in the field of image and video generation. Despite …

被引用次数：1 相关文章所有 2 个版本