Wave: Warping ddim inversion features for zero-shot text-to-video editing

Y Feng, S Gao, Y Bao, X Wang, S Han, J Zhang… - … on Computer Vision, 2025 - Springer
Text-driven video editing has emerged as a prominent application based on the
breakthroughs of image diffusion models. Existing state-of-the-art methods focus on zero …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation

H Fang, D Qiu, B Mao, P Yan, H Tang - arXiv preprint arXiv:2411.18281, 2024 - arxiv.org
Recent advancements in personalized Text-to-Video (T2V) generation highlight the
importance of integrating character-specific identities and actions. However, previous T2V …

MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis

D Qiu, Z Chen, R Wang, M Fan, C Yu, J Huan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in character video synthesis still depend on extensive fine-tuning or
complex 3D modeling processes, which can restrict accessibility and hinder real-time …

MotionCraft: Physics-based Zero-Shot Video Generation

LS Aira, A Montanaro, E Aiello, D Valsesia… - arXiv preprint arXiv …, 2024 - arxiv.org
Generating videos with realistic and physically plausible motion is one of the main recent
challenges in computer vision. While diffusion models are achieving compelling results in …

MotionCraft: Physics-Based Zero-Shot Video Generation

A Montanaro, LS Aira, E Aiello, D Valsesia… - The Thirty-eighth Annual … - openreview.net
Generating videos with realistic and physically plausible motion is one of the main recent
challenges in computer vision. While diffusion models are achieving compelling results in …