A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A survey of convolutional neural networks: analysis, applications, and prospects
A convolutional neural network (CNN) is one of the most significant networks in the deep
learning field. Since CNN made impressive achievements in many areas, including but not …
learning field. Since CNN made impressive achievements in many areas, including but not …
Adaface: Quality adaptive margin for face recognition
Recognition in low quality face datasets is challenging because facial attributes are
obscured and degraded. Advances in margin-based loss functions have resulted in …
obscured and degraded. Advances in margin-based loss functions have resulted in …
Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation
Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
Mitigating neural network overconfidence with logit normalization
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …
models in the real world. However, neural networks are known to suffer from the …
Encoder-based domain tuning for fast personalization of text-to-image models
Text-to-image personalization aims to teach a pre-trained diffusion model to reason about
novel, user provided concepts, embedding them into new scenes guided by natural …
novel, user provided concepts, embedding them into new scenes guided by natural …
Efficient geometry-aware 3d generative adversarial networks
Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using
only collections of single-view 2D photographs has been a long-standing challenge …
only collections of single-view 2D photographs has been a long-standing challenge …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Diffusionclip: Text-guided diffusion models for robust image manipulation
Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …
(CLIP) enables zero-shot image manipulation guided by text prompts. However, their …
Diffusion autoencoders: Toward a meaningful and decodable representation
K Preechakul, N Chatthee… - Proceedings of the …, 2022 - openaccess.thecvf.com
Diffusion probabilistic models (DPMs) have achieved remarkable quality in image
generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack …
generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack …