A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A survey of convolutional neural networks: analysis, applications, and prospects

Z Li, F Liu, W Yang, S Peng… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
A convolutional neural network (CNN) is one of the most significant networks in the deep
learning field. Since CNN made impressive achievements in many areas, including but not …

Adaface: Quality adaptive margin for face recognition

M Kim, AK Jain, X Liu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Recognition in low quality face datasets is challenging because facial attributes are
obscured and degraded. Advances in margin-based loss functions have resulted in …

Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation

W Zhang, X Cun, X Wang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …

Mitigating neural network overconfidence with logit normalization

H Wei, R Xie, H Cheng, L Feng… - … conference on machine …, 2022 - proceedings.mlr.press
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …

Encoder-based domain tuning for fast personalization of text-to-image models

R Gal, M Arar, Y Atzmon, AH Bermano… - ACM Transactions on …, 2023 - dl.acm.org
Text-to-image personalization aims to teach a pre-trained diffusion model to reason about
novel, user provided concepts, embedding them into new scenes guided by natural …

Efficient geometry-aware 3d generative adversarial networks

ER Chan, CZ Lin, MA Chan… - Proceedings of the …, 2022 - openaccess.thecvf.com
Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using
only collections of single-view 2D photographs has been a long-standing challenge …

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Diffusionclip: Text-guided diffusion models for robust image manipulation

G Kim, T Kwon, JC Ye - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …

Diffusion autoencoders: Toward a meaningful and decodable representation

K Preechakul, N Chatthee… - Proceedings of the …, 2022 - openaccess.thecvf.com
Diffusion probabilistic models (DPMs) have achieved remarkable quality in image
generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack …