A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation

W Zhang, X Cun, X Wang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …

Pose-controllable talking face generation by implicitly modularized audio-visual representation

H Zhou, Y Sun, W Wu, CC Loy… - Proceedings of the …, 2021 - openaccess.thecvf.com
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Ad-nerf: Audio driven neural radiance fields for talking head synthesis

Y Guo, K Chen, S Liang, YJ Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generating high-fidelity talking head video by fitting with the input audio sequence is a
challenging problem that receives considerable attentions recently. In this paper, we …

A lip sync expert is all you need for speech to lip generation in the wild

KR Prajwal, R Mukhopadhyay, VP Namboodiri… - Proceedings of the 28th …, 2020 - dl.acm.org
In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary
identity to match a target speech segment. Current works excel at producing accurate lip …

One-shot free-view neural talking-head synthesis for video conferencing

TC Wang, A Mallya, MY Liu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
We propose a neural talking-head video synthesis model and demonstrate its application to
video conferencing. Our model learns to synthesize a talking-head video using a source …

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …

Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset

Z Zhang, L Li, Y Ding, C Fan - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
One-shot talking face generation should synthesize high visual quality facial videos with
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …