Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image...

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

被引用次数：41 相关文章所有 9 个版本

[PDF] arxiv.org

EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model Under Weak Conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2025 - Springer

In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

被引用次数：87 相关文章所有 2 个版本

[PDF] thecvf.com

Photomaker: Customizing realistic human photos via stacked id embedding

Z Li, M Cao, X Wang, Z Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advances in text-to-image generation have made remarkable progress in
synthesizing realistic human photos conditioned on given text prompts. However existing …

被引用次数：121 相关文章所有 3 个版本

[PDF] cell.com Full View

Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com

Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

被引用次数：29 相关文章所有 7 个版本

[PDF] thecvf.com

Livelyspeaker: Towards semantic-aware co-speech gesture generation

Y Zhi, X Cun, X Chen, X Shen, W Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Mofa-video: Controllable image animation via generative motion field adaptions in frozen image-to-video diffusion model

M Niu, X Cun, X Wang, Y Zhang, Y Shan… - European Conference on …, 2025 - Springer

We present MOFA-Video, an advanced controllable image animation method that generates
video from the given image using various additional controllable signals (such as human …

被引用次数：17 相关文章所有 2 个版本

[PDF] thecvf.com

Portraitbooth: A versatile portrait model for fast identity-preserved personalization

X Peng, J Zhu, B Jiang, Y Tai, D Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in personalized image generation using diffusion models have been
noteworthy. However existing methods suffer from inefficiencies due to the requirement for …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation

Y Ma, H Liu, H Wang, H Pan, Y He, J Yuan… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …

被引用次数：23 相关文章所有 2 个版本

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

被引用次数：49 相关文章

[PDF] arxiv.org

Diffposetalk: Speech-driven stylistic 3d facial animation and head pose generation via diffusion models

Z Sun, T Lv, S Ye, M Lin, J Sheng, YH Wen… - ACM Transactions on …, 2024 - dl.acm.org

The generation of stylistic 3D facial animations driven by speech presents a significant
challenge as it requires learning a many-to-many mapping between speech, style, and the …

被引用次数：25 相关文章所有 3 个版本