Synthasr: Unlocking synthetic data for speech recognition

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：200 相关文章所有 6 个版本

[PDF] arxiv.org

Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - arXiv preprint arXiv …, 2023 - arxiv.org

Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …

被引用次数：153 相关文章所有 2 个版本

[HTML] ieee-jas.net

Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real

Q Miao, Y Lv, M Huang, X Wang… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org

The virtual-to-real paradigm, ie, training models on virtual data and then applying them to
solve real-world problems, has attracted more and more attention from various domains by …

被引用次数：87 相关文章所有 3 个版本

[PDF] arxiv.org

On the importance and applicability of pre-training for federated learning

HY Chen, CH Tu, Z Li, HW Shen, WL Chao - arXiv preprint arXiv …, 2022 - arxiv.org

Pre-training is prevalent in nowadays deep learning to improve the learned model's
performance. However, in the literature on federated learning (FL), neural networks are …

被引用次数：76 相关文章所有 3 个版本

[PDF] thecvf.com

Synthvsr: Scaling up visual speech recognition with synthetic supervision

X Liu, E Lakomkin, K Vougioukas… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on
increasingly large amounts of video data, while the publicly available transcribed video …

被引用次数：20 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Computer-assisted pronunciation training—Speech synthesis is almost all you need

D Korzekwa, J Lorenzo-Trueba, T Drugman… - Speech …, 2022 - Elsevier

The research community has long studied computer-assisted pronunciation training (CAPT)
methods in non-native speech. Researchers focused on studying various model …

被引用次数：38 相关文章所有 6 个版本

[PDF] kyoto-u.ac.jp

Data augmentation for asr using tts via a discrete representation

S Ueno, M Mimura, S Sakai… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

While end-to-end automatic speech recognition (ASR) has achieved high performance, it
requires a huge amount of paired speech and transcription data for training. Recently, data …

被引用次数：26 相关文章所有 3 个版本

[PDF] google.com

A semi-supervised complementary joint training approach for low-resource speech recognition

YQ Du, J Zhang, X Fang, MH Wu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

R Zhao, J Xue, P Parthasarathy… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Neural transducer is now the most popular end-to-end model for speech recognition, due to
its naturally streaming ability. However, it is challenging to adapt it with text-only data …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Generating data with text-to-speech and large-language models for conversational speech recognition

S Cornell, J Darefsky, Z Duan, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …

被引用次数：2 相关文章所有 5 个版本