A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …
everywhere because of its ability to analyze and create text, images, and beyond. With such …
AI-generated characters for supporting personalized learning and well-being
Advancements in machine learning have recently enabled the hyper-realistic synthesis of
prose, images, audio and video data, in what is referred to as artificial intelligence (AI) …
prose, images, audio and video data, in what is referred to as artificial intelligence (AI) …
Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation
Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
Codetalker: Speech-driven 3d facial animation with discrete motion prior
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
Pose-controllable talking face generation by implicitly modularized audio-visual representation
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …
talking face generation, the problem of how to efficiently drive the head pose remains …
Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Expressive talking head generation with granular audio-visual control
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
One-shot talking face generation should synthesize high visual quality facial videos with
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …
Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan
One-shot talking face generation aims at synthesizing a high-quality talking face video from
an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a …
an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a …
Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …