A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Machine learning for synthetic data generation: a review
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …
data-related issues. These include data of poor quality, insufficient data points leading to …
Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real
Q Miao, Y Lv, M Huang, X Wang… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org
The virtual-to-real paradigm, ie, training models on virtual data and then applying them to
solve real-world problems, has attracted more and more attention from various domains by …
solve real-world problems, has attracted more and more attention from various domains by …
On the importance and applicability of pre-training for federated learning
Pre-training is prevalent in nowadays deep learning to improve the learned model's
performance. However, in the literature on federated learning (FL), neural networks are …
performance. However, in the literature on federated learning (FL), neural networks are …
Synthvsr: Scaling up visual speech recognition with synthetic supervision
Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on
increasingly large amounts of video data, while the publicly available transcribed video …
increasingly large amounts of video data, while the publicly available transcribed video …
[HTML][HTML] Computer-assisted pronunciation training—Speech synthesis is almost all you need
The research community has long studied computer-assisted pronunciation training (CAPT)
methods in non-native speech. Researchers focused on studying various model …
methods in non-native speech. Researchers focused on studying various model …
Data augmentation for asr using tts via a discrete representation
While end-to-end automatic speech recognition (ASR) has achieved high performance, it
requires a huge amount of paired speech and transcription data for training. Recently, data …
requires a huge amount of paired speech and transcription data for training. Recently, data …
A semi-supervised complementary joint training approach for low-resource speech recognition
Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …
speech recognition (ASR), which, however were either separately used for pre-training, self …
Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models
R Zhao, J Xue, P Parthasarathy… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Neural transducer is now the most popular end-to-end model for speech recognition, due to
its naturally streaming ability. However, it is challenging to adapt it with text-only data …
its naturally streaming ability. However, it is challenging to adapt it with text-only data …
Generating data with text-to-speech and large-language models for conversational speech recognition
Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …