A lip sync expert is all you need for speech to lip generation in the wild

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

被引用次数：177 相关文章所有 4 个版本

[PDF] nature.com

AI-generated characters for supporting personalized learning and well-being

P Pataranutaporn, V Danry, J Leong… - Nature Machine …, 2021 - nature.com

Advancements in machine learning have recently enabled the hyper-realistic synthesis of
prose, images, audio and video data, in what is referred to as artificial intelligence (AI) …

被引用次数：229 相关文章所有 7 个版本

[PDF] thecvf.com

Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation

W Zhang, X Cun, X Wang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …

被引用次数：169 相关文章所有 7 个版本

[PDF] thecvf.com

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J Xing, M Xia, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

被引用次数：113 相关文章所有 8 个版本

[PDF] thecvf.com

Pose-controllable talking face generation by implicitly modularized audio-visual representation

H Zhou, Y Sun, W Wu, CC Loy… - Proceedings of the …, 2021 - openaccess.thecvf.com

While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …

被引用次数：355 相关文章所有 9 个版本

[PDF] arxiv.org

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：202 相关文章所有 11 个版本

[PDF] thecvf.com

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com

Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …

被引用次数：123 相关文章所有 4 个版本

[PDF] thecvf.com

Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset

Z Zhang, L Li, Y Ding, C Fan - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

One-shot talking face generation should synthesize high visual quality facial videos with
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …

被引用次数：258 相关文章所有 5 个版本

[PDF] arxiv.org

Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan

F Yin, Y Zhang, X Cun, M Cao, Y Fan, X Wang… - European conference on …, 2022 - Springer

One-shot talking face generation aims at synthesizing a high-quality talking face video from
an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a …

被引用次数：148 相关文章所有 6 个版本

[PDF] arxiv.org

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：324 相关文章所有 11 个版本