Sound-guided semantic video generation

L Ruan, Y Ma, H Yang, H He, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose the first joint audio-video generation framework that brings engaging watching
and listening experiences simultaneously, towards high-quality realistic videos. To generate …

被引用次数：91 相关文章所有 5 个版本

[PDF] aaai.org

Diverse and aligned audio-to-video generation via text-to-video model adaptation

G Yariv, I Gat, S Benaim, L Wolf, I Schwartz… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

We consider the task of generating diverse and realistic videos guided by natural audio
samples from a wide variety of semantic classes. For this task, the videos are required to be …

被引用次数：8 相关文章所有 3 个版本

[PDF] thecvf.com

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Soundini: Sound-guided diffusion for natural video editing

SH Lee, S Kim, I Yoo, F Yang, D Cho, Y Kim… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose a method for adding sound-guided visual effects to specific regions of videos
with a zero-shot setting. Animating the appearance of the visual effect is challenging …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Audio-Synchronized Visual Animation

L Zhang, S Mo, Y Zhang, P Morgado - arXiv preprint arXiv:2403.05659, 2024 - arxiv.org

Current visual generation methods can produce high quality videos guided by texts.
However, effectively controlling object dynamics remains a challenge. This work explores …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

The power of sound (tpos): Audio reactive video generation with stable diffusion

Y Jeong, W Ryoo, S Lee, D Seo… - Proceedings of the …, 2023 - openaccess.thecvf.com

In recent years, video generation has become a prominent generative tool and has drawn
significant attention. However, there is little consideration in audio-to-video generation …

被引用次数：19 相关文章所有 8 个版本

TA2V: Text-Audio Guided Video Generation

M Zhao, W Wang, T Chen, R Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Recent conditional and unconditional video generation tasks have been accomplished
mainly based on generative adversarial network (GAN), diffusion, and autoregressive …

[PDF] arxiv.org