Learning to dub movies via hierarchical prosody models
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as
visual voice clone, V2C) task aims to generate speeches that match the speaker's emotion …
visual voice clone, V2C) task aims to generate speeches that match the speaker's emotion …
Audio-visual synchronisation in the wild
In this paper, we consider the problem of audio-visual synchronisation applied to videosin-
the-wild'(ie of general classes beyond speech). As a new task, we identify and curate a test …
the-wild'(ie of general classes beyond speech). As a new task, we identify and curate a test …
Towards realistic visual dubbing with heterogeneous sources
The task of few-shot visual dubbing focuses on synchronizing the lip movements with
arbitrary speech input for any talking head video. Albeit moderate improvements in current …
arbitrary speech input for any talking head video. Albeit moderate improvements in current …
SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning
U Muaz, W Jang, R Tripathi, S Mani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dubbed video generation aims to accurately synchronize mouth movements of a given facial
video with driving audio while preserving identity and scene-specific visual dynamics, such …
video with driving audio while preserving identity and scene-specific visual dynamics, such …
Neural dubber: Dubbing for videos according to scripts
Dubbing is a post-production process of re-recording actors' dialogues, which is extensively
used in filmmaking and video production. It is usually performed manually by professional …
used in filmmaking and video production. It is usually performed manually by professional …
More than words: In-the-wild visually-driven prosody for text-to-speech
M Hassid, MT Ramanovich… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. Motivated by
dubbing, VDTTS takes advantage of video frames as an additional input alongside text, and …
dubbing, VDTTS takes advantage of video frames as an additional input alongside text, and …
Multilingual video dubbing—a technology review and current challenges
D Bigioi, P Corcoran - Frontiers in Signal Processing, 2023 - frontiersin.org
The proliferation of multi-lingual content on today's streaming services has created a need
for automated multi-lingual dubbing tools. In this article, current state-of-the-art approaches …
for automated multi-lingual dubbing tools. In this article, current state-of-the-art approaches …
Data-Driven Advancements in Lip Motion Analysis: A Review
This work reviews the dataset-driven advancements that have occurred in the area of lip
motion analysis, particularly visual lip-reading and visual lip motion authentication, in the …
motion analysis, particularly visual lip-reading and visual lip motion authentication, in the …
Looking similar sounding different: Leveraging counterfactual cross-modal pairs for audiovisual representation learning
Audiovisual representation learning typically relies on the correspondence between sight
and sound. However there are often multiple audio tracks that can correspond with a visual …
and sound. However there are often multiple audio tracks that can correspond with a visual …
Visual dubbing pipeline with localized lip-sync and two-pass identity transfer
Visual dubbing uses visual computing and deep learning to alter the lip and mouth
articulations of the actor to sync with the dubbed speech. It has the potential to greatly …
articulations of the actor to sync with the dubbed speech. It has the potential to greatly …