Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J Xing, M Xia, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan

F Yin, Y Zhang, X Cun, M Cao, Y Fan, X Wang… - European conference on …, 2022 - Springer
One-shot talking face generation aims at synthesizing a high-quality talking face video from
an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a …

Diffused heads: Diffusion models beat gans on talking-face generation

M Stypułkowski, K Vougioukas, S He… - Proceedings of the …, 2024 - openaccess.thecvf.com
Talking face generation has historically struggled to produce head movements and natural
facial expressions without guidance from additional reference videos. Recent developments …

Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

One-shot talking face generation from single-speaker audio-visual correlation learning

S Wang, L Li, Y Ding, X Yu - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Audio-driven one-shot talking face generation methods are usually trained on video
resources of various persons. However, their created videos often suffer unnatural mouth …

Styletalk: One-shot talking head generation with controllable speaking styles

Y Ma, S Wang, Z Hu, C Fan, T Lv, Y Ding… - Proceedings of the …, 2023 - ojs.aaai.org
Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …

Dfa-nerf: Personalized talking head generation via disentangled face attributes neural rendering

S Yao, RZ Zhong, Y Yan, G Zhai, X Yang - arXiv preprint arXiv:2201.00791, 2022 - arxiv.org
While recent advances in deep neural networks have made it possible to render high-quality
images, generating photo-realistic and personalized talking head remains challenging. With …

Videoretalking: Audio-based lip synchronization for talking head video editing in the wild

K Cheng, X Cun, Y Zhang, M Xia, F Yin, M Zhu… - SIGGRAPH Asia 2022 …, 2022 - dl.acm.org
We present VideoReTalking, a new system to edit the faces of a real-world talking head
video according to input audio, producing a high-quality and lip-syncing output video even …