Mm-diffusion: Learning multi-modal diffusion models for joint audio and video generation

L Ruan, Y Ma, H Yang, H He, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose the first joint audio-video generation framework that brings engaging watching
and listening experiences simultaneously, towards high-quality realistic videos. To generate …

Diverse and aligned audio-to-video generation via text-to-video model adaptation

G Yariv, I Gat, S Benaim, L Wolf, I Schwartz… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
We consider the task of generating diverse and realistic videos guided by natural audio
samples from a wide variety of semantic classes. For this task, the videos are required to be …

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Soundini: Sound-guided diffusion for natural video editing

SH Lee, S Kim, I Yoo, F Yang, D Cho, Y Kim… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose a method for adding sound-guided visual effects to specific regions of videos
with a zero-shot setting. Animating the appearance of the visual effect is challenging …

Audio-Synchronized Visual Animation

L Zhang, S Mo, Y Zhang, P Morgado - arXiv preprint arXiv:2403.05659, 2024 - arxiv.org
Current visual generation methods can produce high quality videos guided by texts.
However, effectively controlling object dynamics remains a challenge. This work explores …

The power of sound (tpos): Audio reactive video generation with stable diffusion

Y Jeong, W Ryoo, S Lee, D Seo… - Proceedings of the …, 2023 - openaccess.thecvf.com
In recent years, video generation has become a prominent generative tool and has drawn
significant attention. However, there is little consideration in audio-to-video generation …

TA2V: Text-Audio Guided Video Generation

M Zhao, W Wang, T Chen, R Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recent conditional and unconditional video generation tasks have been accomplished
mainly based on generative adversarial network (GAN), diffusion, and autoregressive …

SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

BC Biner, FM Sofian, UB Karakaş, D Ceylan… - arXiv preprint arXiv …, 2024 - arxiv.org
We are witnessing a revolution in conditional image synthesis with the recent success of
large scale text-to-image generation methods. This success also opens up new …

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

R Yang, H Gamper, S Braun - arXiv preprint arXiv:2312.05412, 2023 - arxiv.org
We introduce a multi-modal diffusion model tailored for the bi-directional conditional
generation of video and audio. Recognizing the importance of accurate alignment between …

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

A Hayakawa, M Ishii, T Shibuya, Y Mitsufuji - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we aim to construct an audio-video generative model with minimal
computational cost by leveraging pre-trained single-modal generative models for audio and …