Diff2lip: Audio conditioned diffusion models for lip-synchronization
The task of lip synchronization (lip-sync) seeks to match the lips of human faces with
different audio. It has various applications in the film industry as well as for creating virtual …
different audio. It has various applications in the film industry as well as for creating virtual …
SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning
U Muaz, W Jang, R Tripathi, S Mani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dubbed video generation aims to accurately synchronize mouth movements of a given facial
video with driving audio while preserving identity and scene-specific visual dynamics, such …
video with driving audio while preserving identity and scene-specific visual dynamics, such …
Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation
In the field of speech synthesis, there is a growing emphasis on employing multimodal
speech to enhance robustness. A key challenge in this area is the scarcity of datasets that …
speech to enhance robustness. A key challenge in this area is the scarcity of datasets that …
FakeTracer: proactively defending against face-swap DeepFakes via implanting traces in training
Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …
original face in a video with a generated face of the target identity while retaining consistent …
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers
As one of the crucial elements in human-robot interaction, responsive listening head
generation has attracted considerable attention from researchers. It aims to generate a …
generation has attracted considerable attention from researchers. It aims to generate a …
FakeTracer: Catching Face-swap DeepFakes via Implanting Traces in Training
Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …
original face in a video with a generated face of the target identity while retaining consistent …
GaussianSpeech: Audio-Driven Gaussian Avatars
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation
sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To …
sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To …
[HTML][HTML] Generating dynamic lip-syncing using target audio in a multimedia environment
The presented research focuses on the challenging task of creating lip-sync facial videos
that align with a specified target speech segment. A novel deep-learning model has been …
that align with a specified target speech segment. A novel deep-learning model has been …
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Although existing speech-driven talking face generation methods achieve significant
progress, they are far from real-world application due to the avatar-specific training demand …
progress, they are far from real-world application due to the avatar-specific training demand …
Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors
J Saunders, V Namboodiri - arXiv preprint arXiv:2401.06126, 2024 - arxiv.org
Visual dubbing is the process of generating lip motions of an actor in a video to synchronise
with given audio. Recent advances have made progress towards this goal but have not …
with given audio. Recent advances have made progress towards this goal but have not …