Flowvid: Taming imperfect optical flows for consistent video-to-video synthesis

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：76 相关文章所有 3 个版本

[PDF] arxiv.org

Diffusion model-based video editing: A survey

W Sun, RC Tu, J Liao, D Tao - arXiv preprint arXiv:2407.07111, 2024 - arxiv.org

The rapid development of diffusion models (DMs) has significantly advanced image and
video applications, making" what you want is what you see" a reality. Among these, video …

被引用次数：8 相关文章

[PDF] openreview.net

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

M Ku, C Wei, W Ren, H Yang, W Chen - Transactions on Machine …, 2024 - openreview.net

In the dynamic field of digital content creation using generative models, state-of-the-art video
editing models still do not offer the level of quality and control that users desire. Previous …

被引用次数：8 相关文章

[PDF] arxiv.org

Deco: Decoupled human-centered diffusion video editing with motion consistency

X Zhong, X Huang, X Yang, G Lin, Q Wu - European Conference on …, 2025 - Springer

Diffusion models usher a new era of video editing, flexibly manipulating the video contents
with text prompts. Despite the widespread application demand in editing human-centered …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Anyv2v: A plug-and-play framework for any video-to-video editing tasks

M Ku, C Wei, W Ren, H Yang, W Chen - arXiv preprint arXiv:2403.14468, 2024 - arxiv.org

Video-to-video editing involves editing a source video along with additional control (such as
text prompts, subjects, or styles) to generate a new video that aligns with the source video …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

W Xuan, Y Xu, S Zhao, C Wang, J Liu, B Du… - Proceedings of the 32nd …, 2024 - dl.acm.org

ControlNet excels at creating content that closely matches precise contours in user-provided
masks. However, when these masks contain noise, as a frequent occurrence with non …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models

Y Li, Y Zhang, S Liu, X Lin - arXiv preprint arXiv:2409.19128, 2024 - arxiv.org

Despite the remarkable generation capabilities of Diffusion Models (DMs), conducting
training and inference remains computationally expensive. Previous works have been …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Live2diff: Live stream translation via uni-directional attention in video diffusion models

Z Xing, G Fox, Y Zeng, X Pan, M Elgharib… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models have shown remarkable efficacy in generating streaming data such
as text and audio, thanks to their temporally uni-directional attention mechanism, which …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Generative Video Propagation

S Liu, T Wang, JH Wang, Q Liu, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale video generation models have the inherent ability to realistically model natural
scenes. In this paper, we demonstrate that through a careful design of a generative video …

HyV-Summ: Social media video summarization on custom dataset using hybrid techniques

J Paul, A Roy, A Mitra, J Sil - Neurocomputing, 2025 - Elsevier

The proliferation of social networking platforms such as YouTube, Facebook, Instagram, and
X has led to an exponential growth in multimedia content, with billions of videos uploaded …