- 学术资源搜索

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：190 相关文章所有 6 个版本

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：406 相关文章所有 7 个版本

[PDF] dtu.dk

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：384 相关文章所有 10 个版本

[PDF] neurips.cc

Usb: A unified semi-supervised learning benchmark for classification

Y Wang, H Chen, Y Fan, W Sun… - Advances in …, 2022 - proceedings.neurips.cc

Semi-supervised learning (SSL) improves model generalization by leveraging massive
unlabeled data to augment limited labeled samples. However, currently, popular SSL …

被引用次数：128 相关文章所有 9 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：146 相关文章所有 6 个版本

[PDF] arxiv.org

Self-training and pre-training are complementary for speech recognition

Q Xu, A Baevski, T Likhomanenko… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Self-training and unsupervised pre-training have emerged as effective approaches to
improve speech recognition systems using unlabeled data. However, it is not clear whether …

被引用次数：192 相关文章所有 7 个版本

[PDF] arxiv.org

Self-training for end-to-end speech recognition

J Kahn, A Lee, A Hannun - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

We revisit self-training in the context of end-to-end speech recognition. We demonstrate that
training with pseudo-labels can substantially improve the accuracy of a baseline model. Key …

被引用次数：262 相关文章所有 5 个版本

[PDF] arxiv.org

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

被引用次数：242 相关文章所有 7 个版本

[PDF] arxiv.org

Audio albert: A lite bert for self-supervised learning of audio representation

PH Chi, PH Chung, TH Wu, CC Hsieh… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

Self-supervised speech models are powerful speech representation extractors for
downstream applications. Recently, larger models have been utilized in acoustic model …

被引用次数：188 相关文章所有 6 个版本

[PDF] arxiv.org

Deep contextualized acoustic representations for semi-supervised speech recognition

S Ling, Y Liu, J Salazar… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

We propose a novel approach to semi-supervised automatic speech recognition (ASR). We
first exploit a large amount of unlabeled audio data via representation learning, where we …

被引用次数：168 相关文章所有 8 个版本