Large-scale multilingual audio visual dubbing

G Cong, L Li, Y Qi, ZJ Zha, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as
visual voice clone, V2C) task aims to generate speeches that match the speaker's emotion …

被引用次数：18 相关文章所有 8 个版本

[PDF] arxiv.org

Audio-visual synchronisation in the wild

H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we consider the problem of audio-visual synchronisation applied to videosin-
the-wild'(ie of general classes beyond speech). As a new task, we identify and curate a test …

被引用次数：48 相关文章所有 9 个版本

[PDF] arxiv.org

Towards realistic visual dubbing with heterogeneous sources

T Xie, L Liao, C Bi, B Tang, X Yin, J Yang… - Proceedings of the 29th …, 2021 - dl.acm.org

The task of few-shot visual dubbing focuses on synchronizing the lip movements with
arbitrary speech input for any talking head video. Albeit moderate improvements in current …

被引用次数：36 相关文章所有 4 个版本

[PDF] thecvf.com

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning

U Muaz, W Jang, R Tripathi, S Mani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Dubbed video generation aims to accurately synchronize mouth movements of a given facial
video with driving audio while preserving identity and scene-specific visual dynamics, such …

被引用次数：5 相关文章所有 4 个版本

[PDF] neurips.cc

Neural dubber: Dubbing for videos according to scripts

C Hu, Q Tian, T Li, W Yuping… - Advances in neural …, 2021 - proceedings.neurips.cc

Dubbing is a post-production process of re-recording actors' dialogues, which is extensively
used in filmmaking and video production. It is usually performed manually by professional …

被引用次数：38 相关文章所有 7 个版本

[PDF] thecvf.com

More than words: In-the-wild visually-driven prosody for text-to-speech

M Hassid, MT Ramanovich… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. Motivated by
dubbing, VDTTS takes advantage of video frames as an additional input alongside text, and …

被引用次数：17 相关文章所有 6 个版本

[PDF] frontiersin.org

Multilingual video dubbing—a technology review and current challenges

D Bigioi, P Corcoran - Frontiers in Signal Processing, 2023 - frontiersin.org

The proliferation of multi-lingual content on today's streaming services has created a need
for automated multi-lingual dubbing tools. In this article, current state-of-the-art approaches …

被引用次数：2 相关文章所有 2 个版本

[PDF] mdpi.com

Data-Driven Advancements in Lip Motion Analysis: A Review

S Torrie, A Sumsion, DJ Lee, Z Sun - Electronics, 2023 - mdpi.com

This work reviews the dataset-driven advancements that have occurred in the area of lip
motion analysis, particularly visual lip-reading and visual lip motion authentication, in the …

被引用次数：1 相关文章所有 4 个版本

[PDF] thecvf.com

Looking similar sounding different: Leveraging counterfactual cross-modal pairs for audiovisual representation learning

N Singh, CW Wu, I Orife… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Audiovisual representation learning typically relies on the correspondence between sight
and sound. However there are often multiple audio tracks that can correspond with a visual …

被引用次数：3 相关文章所有 3 个版本

[PDF] concordia.ca

Visual dubbing pipeline with localized lip-sync and two-pass identity transfer

D Patel, H Zouaghi, S Mudur, E Paquette… - Computers & …, 2023 - Elsevier

Visual dubbing uses visual computing and deep learning to alter the lip and mouth
articulations of the actor to sync with the dubbed speech. It has the potential to greatly …

被引用次数：3 相关文章所有 6 个版本