Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model Under Weak Conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2025 - Springer
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

Photomaker: Customizing realistic human photos via stacked id embedding

Z Li, M Cao, X Wang, Z Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in text-to-image generation have made remarkable progress in
synthesizing realistic human photos conditioned on given text prompts. However existing …

Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Livelyspeaker: Towards semantic-aware co-speech gesture generation

Y Zhi, X Cun, X Chen, X Shen, W Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …

Mofa-video: Controllable image animation via generative motion field adaptions in frozen image-to-video diffusion model

M Niu, X Cun, X Wang, Y Zhang, Y Shan… - European Conference on …, 2025 - Springer
We present MOFA-Video, an advanced controllable image animation method that generates
video from the given image using various additional controllable signals (such as human …

Portraitbooth: A versatile portrait model for fast identity-preserved personalization

X Peng, J Zhu, B Jiang, Y Tai, D Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in personalized image generation using diffusion models have been
noteworthy. However existing methods suffer from inefficiencies due to the requirement for …

Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation

Y Ma, H Liu, H Wang, H Pan, Y He, J Yuan… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Diffposetalk: Speech-driven stylistic 3d facial animation and head pose generation via diffusion models

Z Sun, T Lv, S Ye, M Lin, J Sheng, YH Wen… - ACM Transactions on …, 2024 - dl.acm.org
The generation of stylistic 3D facial animations driven by speech presents a significant
challenge as it requires learning a many-to-many mapping between speech, style, and the …