Expressive talking head generation with granular audio-visual control

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：204 相关文章所有 11 个版本

[PDF] arxiv.org

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

被引用次数：36 相关文章所有 9 个版本

[PDF] thecvf.com

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J Xing, M Xia, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

被引用次数：127 相关文章所有 8 个版本

[PDF] thecvf.com

Seeing what you said: Talking face generation guided by a lip reading expert

J Wang, X Qian, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …

被引用次数：70 相关文章所有 6 个版本

[PDF] arxiv.org

CelebV-HQ: A large-scale video facial attributes dataset

H Zhu, W Wu, W Zhu, L Jiang, S Tang, L Zhang… - European conference on …, 2022 - Springer

Large-scale datasets have played indispensable roles in the recent success of face
generation/editing and significantly facilitated the advances of emerging research fields …

被引用次数：95 相关文章所有 6 个版本

[PDF] thecvf.com

Progressive disentangled representation learning for fine-grained controllable talking head synthesis

D Wang, Y Deng, Z Yin, HY Shum… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …

被引用次数：56 相关文章所有 5 个版本

[PDF] arxiv.org

Semantic-aware implicit neural audio-driven video portrait generation

X Liu, Y Xu, Q Wu, H Zhou, W Wu, B Zhou - European conference on …, 2022 - Springer

Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital
entertainment. While most previous studies rely on accurate explicit structural information …

被引用次数：128 相关文章所有 5 个版本

[PDF] thecvf.com

Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

被引用次数：48 相关文章所有 5 个版本

[PDF] aaai.org

Styletalk: One-shot talking head generation with controllable speaking styles

Y Ma, S Wang, Z Hu, C Fan, T Lv, Y Ding… - Proceedings of the …, 2023 - ojs.aaai.org

Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …

被引用次数：64 相关文章所有 5 个版本

[PDF] thecvf.com

Emmn: Emotional motion memory network for audio-driven emotional talking face generation

S Tan, B Ji, Y Pan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Synthesizing expression is essential to create realistic talking faces. Previous works
consider expressions and mouth shapes as a whole and predict them solely from audio …

被引用次数：24 相关文章所有 4 个版本