Learning word-level confidence for subword end-to-end ASR

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

Large-scale asr domain adaptation using self-and semi-supervised learning

D Hwang, A Misra, Z Huo, N Siddhartha… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Self-and semi-supervised learning methods have been actively investigated to reduce
labeled training data or enhance model performance. However, these approaches mostly …

被引用次数：45 相关文章所有 6 个版本

[PDF] arxiv.org

Pseudo label is better than human label

D Hwang, KC Sim, Z Huo, T Strohman - arXiv preprint arXiv:2203.12668, 2022 - arxiv.org

State-of-the-art automatic speech recognition (ASR) systems are trained with tens of
thousands of hours of labeled speech data. Human transcription is expensive and time …

被引用次数：30 相关文章所有 7 个版本

Improving the latency and quality of cascaded encoders

TN Sainath, Y He, A Narayanan… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we explore reducing computational latency of the 2-pass cascaded encoder
model [1]. Specifically, we experiment with reducing the size of the causal 1st-pass and …

被引用次数：26 相关文章

[PDF] arxiv.org

On addressing practical challenges for rnn-transducer

R Zhao, J Xue, J Li, W Wei, L He… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Asr and emotional speech: A word-level investigation of the mutual impact of speech and emotion recognition

Y Li, Z Zhao, O Klejch, P Bell, C Lai - arXiv preprint arXiv:2305.16065, 2023 - arxiv.org

In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to
address their inherent variability. However, the reliance on human annotated text in most …

被引用次数：8 相关文章所有 7 个版本

[PDF] arxiv.org

Asr rescoring and confidence estimation with electra

H Futami, H Inaguma, M Mimura… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors
should be selected from the n-best list using a language model (LM). However, LMs are …

被引用次数：20 相关文章所有 6 个版本

ETEH: Unified attention-based end-to-end ASR and KWS architecture

G Cheng, H Miao, R Yang, K Deng… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Even though attention-based end-to-end (E2E) automatic speech recognition (ASR) models
have been yielding state-of-the-art recognition accuracy, they still fall behind many of the …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Residual energy-based models for end-to-end speech recognition

Q Li, Y Zhang, B Li, L Cao, PC Woodland - arXiv preprint arXiv …, 2021 - arxiv.org

End-to-end models with auto-regressive decoders have shown impressive results for
automatic speech recognition (ASR). These models formulate the sequence-level probability …

被引用次数：14 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-task learning for end-to-end ASR word and utterance confidence with deletion prediction

D Qiu, Y He, Q Li, Y Zhang, L Cao, I McGraw - arXiv preprint arXiv …, 2021 - arxiv.org

Confidence scores are very useful for downstream applications of automatic speech
recognition (ASR) systems. Recent works have proposed using neural networks to learn …

被引用次数：13 相关文章所有 7 个版本