Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Deep learning for visual speech analysis: A survey
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …
due to its wide applications, such as public security, medical treatment, military defense, and …
Codetalker: Speech-driven 3d facial animation with discrete motion prior
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
Seeing what you said: Talking face generation guided by a lip reading expert
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …
concerning lips given coherent speech input. The previous studies revealed the importance …
CelebV-HQ: A large-scale video facial attributes dataset
Large-scale datasets have played indispensable roles in the recent success of face
generation/editing and significantly facilitated the advances of emerging research fields …
generation/editing and significantly facilitated the advances of emerging research fields …
Progressive disentangled representation learning for fine-grained controllable talking head synthesis
We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …
Semantic-aware implicit neural audio-driven video portrait generation
Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital
entertainment. While most previous studies rely on accurate explicit structural information …
entertainment. While most previous studies rely on accurate explicit structural information …
Stylesync: High-fidelity generalized and personalized lip sync in style-based generator
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …
still struggle to balance generation quality and the model's generalization ability. Previous …
Styletalk: One-shot talking head generation with controllable speaking styles
Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …
shot talking head methods have made significant progress in lip sync, natural facial …
Emmn: Emotional motion memory network for audio-driven emotional talking face generation
Synthesizing expression is essential to create realistic talking faces. Previous works
consider expressions and mouth shapes as a whole and predict them solely from audio …
consider expressions and mouth shapes as a whole and predict them solely from audio …