Towards generating ultra-high resolution talking-face videos with lip synchronization

S Mukhopadhyay, S Suri, RT Gadde… - Proceedings of the …, 2024 - openaccess.thecvf.com

The task of lip synchronization (lip-sync) seeks to match the lips of human faces with
different audio. It has various applications in the film industry as well as for creating virtual …

被引用次数：16 相关文章所有 5 个版本

[PDF] thecvf.com

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning

U Muaz, W Jang, R Tripathi, S Mani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Dubbed video generation aims to accurately synchronize mouth movements of a given facial
video with driving audio while preserving identity and scene-specific visual dynamics, such …

被引用次数：5 相关文章所有 4 个版本

[PDF] aclanthology.org

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

S Lei, X Cheng, M Lyu, J Hu, J Tan, R Liu… - Proceedings of the …, 2024 - aclanthology.org

In the field of speech synthesis, there is a growing emphasis on employing multimodal
speech to enhance robustness. A key challenge in this area is the scarcity of datasets that …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

FakeTracer: proactively defending against face-swap DeepFakes via implanting traces in training

P Sun, H Qi, Y Li, S Lyu - arXiv preprint arXiv:2307.14593, 2023 - arxiv.org

Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …

被引用次数：4 相关文章所有 2 个版本

[PDF] openreview.net

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers

M Liu, J Wang, X Qian, H Li - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

As one of the crucial elements in human-robot interaction, responsive listening head
generation has attracted considerable attention from researchers. It aims to generate a …

FakeTracer: Catching Face-swap DeepFakes via Implanting Traces in Training

P Sun, H Qi, Y Li, S Lyu - IEEE Transactions on Emerging …, 2024 - ieeexplore.ieee.org

Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the
original face in a video with a generated face of the target identity while retaining consistent …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

GaussianSpeech: Audio-Driven Gaussian Avatars

S Aneja, A Sevastopolsky, T Kirschstein, J Thies… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation
sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To …

[HTML][HTML] Generating dynamic lip-syncing using target audio in a multimedia environment

D Pawar, P Borde, P Yannawar - Natural Language Processing Journal, 2024 - Elsevier

The presented research focuses on the challenging task of creating lip-sync facial videos
that align with a specified target speech segment. A novel deep-learning model has been …

[PDF] arxiv.org

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

H Zhang, Z Yuan, C Zheng, X Yan, B Wang, G Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Although existing speech-driven talking face generation methods achieve significant
progress, they are far from real-world application due to the avatar-specific training demand …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

J Saunders, V Namboodiri - arXiv preprint arXiv:2401.06126, 2024 - arxiv.org

Visual dubbing is the process of generating lip motions of an actor in a video to synchronise
with given audio. Recent advances have made progress towards this goal but have not …