A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Recent advances of few-shot learning methods and applications

JY Wang, KX Liu, YC Zhang, B Leng, JH Lu - Science China Technological …, 2023 - Springer
The rapid development of deep learning provides great convenience for production and life.
However, the massive labels required for training models limits further development. Few …

Improving automatic speech recognition performance for low-resource languages with self-supervised models

J Zhao, WQ Zhang - IEEE Journal of Selected Topics in Signal …, 2022 - ieeexplore.ieee.org
Speech self-supervised learning has attracted much attention due to its promising
performance in multiple downstream tasks, and has become a new growth engine for …

Mixspeech: Data augmentation for low-resource automatic speech recognition

L Meng, J Xu, X Tan, J Wang, T Qin… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
In this paper, we propose MixSpeech, a simple yet effective data augmentation method
based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model …

Low-resource expressive text-to-speech using data augmentation

G Huybrechts, T Merritt, G Comini… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically
require a substantial amount of recordings from the target speaker reading in the desired …

Text-to-speech for low-resource agglutinative language with morphology-aware language model pre-training

R Liu, Y Hu, H Zuo, Z Luo, L Wang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the
development of deep learning, encoder-decoder based TTS models perform superior …

Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition

J Ni, L Wang, H Gao, K Qian, Y Zhang, S Chang… - arXiv preprint arXiv …, 2022 - arxiv.org
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech
waveforms corresponding to any written sentence in a language by observing: 1) a …

Language-agnostic meta-learning for low-resource text-to-speech with articulatory features

F Lux, NT Vu - arXiv preprint arXiv:2203.03191, 2022 - arxiv.org
While neural text-to-speech systems perform remarkably well in high-resource scenarios,
they cannot be applied to the majority of the over 6,000 spoken languages in the world due …

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

M Kim, J Choi, D Kim, YM Ro - arXiv preprint arXiv:2308.01831, 2023 - arxiv.org
In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …