A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Recent advances of few-shot learning methods and applications
JY Wang, KX Liu, YC Zhang, B Leng, JH Lu - Science China Technological …, 2023 - Springer
The rapid development of deep learning provides great convenience for production and life.
However, the massive labels required for training models limits further development. Few …
However, the massive labels required for training models limits further development. Few …
Improving automatic speech recognition performance for low-resource languages with self-supervised models
Speech self-supervised learning has attracted much attention due to its promising
performance in multiple downstream tasks, and has become a new growth engine for …
performance in multiple downstream tasks, and has become a new growth engine for …
Mixspeech: Data augmentation for low-resource automatic speech recognition
In this paper, we propose MixSpeech, a simple yet effective data augmentation method
based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model …
based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model …
Low-resource expressive text-to-speech using data augmentation
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically
require a substantial amount of recordings from the target speaker reading in the desired …
require a substantial amount of recordings from the target speaker reading in the desired …
Text-to-speech for low-resource agglutinative language with morphology-aware language model pre-training
Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the
development of deep learning, encoder-decoder based TTS models perform superior …
development of deep learning, encoder-decoder based TTS models perform superior …
Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech
waveforms corresponding to any written sentence in a language by observing: 1) a …
waveforms corresponding to any written sentence in a language by observing: 1) a …
Language-agnostic meta-learning for low-resource text-to-speech with articulatory features
While neural text-to-speech systems perform remarkably well in high-resource scenarios,
they cannot be applied to the majority of the over 6,000 spoken languages in the world due …
they cannot be applied to the majority of the over 6,000 spoken languages in the world due …
Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation
In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …
and text with a single model, especially focusing on the purpose of speech synthesis. We …