Diff2lip: Audio conditioned diffusion models for lip-synchronization

S Mukhopadhyay, S Suri, RT Gadde… - Proceedings of the …, 2024 - openaccess.thecvf.com
The task of lip synchronization (lip-sync) seeks to match the lips of human faces with
different audio. It has various applications in the film industry as well as for creating virtual …

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning

U Muaz, W Jang, R Tripathi, S Mani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dubbed video generation aims to accurately synchronize mouth movements of a given facial
video with driving audio while preserving identity and scene-specific visual dynamics, such …

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

S Lei, X Cheng, M Lyu, J Hu, J Tan, R Liu… - Proceedings of the …, 2024 - aclanthology.org
In the field of speech synthesis, there is a growing emphasis on employing multimodal
speech to enhance robustness. A key challenge in this area is the scarcity of datasets that …

FakeTracer: proactively defending against face-swap DeepFakes via implanting traces in training

P Sun, H Qi, Y Li, S Lyu - arXiv preprint arXiv:2307.14593, 2023 - arxiv.org
Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers

M Liu, J Wang, X Qian, H Li - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
As one of the crucial elements in human-robot interaction, responsive listening head
generation has attracted considerable attention from researchers. It aims to generate a …

FakeTracer: Catching Face-swap DeepFakes via Implanting Traces in Training

P Sun, H Qi, Y Li, S Lyu - IEEE Transactions on Emerging …, 2024 - ieeexplore.ieee.org
Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …

GaussianSpeech: Audio-Driven Gaussian Avatars

S Aneja, A Sevastopolsky, T Kirschstein, J Thies… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation
sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To …

[HTML][HTML] Generating dynamic lip-syncing using target audio in a multimedia environment

D Pawar, P Borde, P Yannawar - Natural Language Processing Journal, 2024 - Elsevier
The presented research focuses on the challenging task of creating lip-sync facial videos
that align with a specified target speech segment. A novel deep-learning model has been …

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

H Zhang, Z Yuan, C Zheng, X Yan, B Wang, G Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Although existing speech-driven talking face generation methods achieve significant
progress, they are far from real-world application due to the avatar-specific training demand …

Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

J Saunders, V Namboodiri - arXiv preprint arXiv:2401.06126, 2024 - arxiv.org
Visual dubbing is the process of generating lip motions of an actor in a video to synchronise
with given audio. Recent advances have made progress towards this goal but have not …