A Multistage Training Framework for Acoustic-to-Word Model.

NQ Pham, TS Nguyen, J Niehues, M Müller… - arXiv preprint arXiv …, 2019 - arxiv.org

Recently, end-to-end sequence-to-sequence models for speech recognition have gained
significant interest in the research community. While previous architecture choices revolve …

被引用次数：207 相关文章所有 6 个版本

[PDF] arxiv.org

Minimum latency training strategies for streaming sequence-to-sequence ASR

H Inaguma, Y Gaur, L Lu, J Li… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Recently, a few novel streaming attention-based sequence-to-sequence (S2S) models have
been proposed to perform online speech recognition with linear-time decoding complexity …

被引用次数：56 相关文章所有 7 个版本

[PDF] arxiv.org

An investigation of phone-based subword units for end-to-end speech recognition

W Wang, G Wang, A Bhatnagar, Y Zhou… - arXiv preprint arXiv …, 2020 - arxiv.org

Phones and their context-dependent variants have been the standard modeling units for
conventional speech recognition systems, while characters and subwords have …

被引用次数：47 相关文章所有 7 个版本

[PDF] arxiv.org

Guiding CTC posterior spike timings for improved posterior fusion and knowledge distillation

G Kurata, K Audhkhasi - arXiv preprint arXiv:1904.08311, 2019 - arxiv.org

Conventional automatic speech recognition (ASR) systems trained from frame-level
alignments can easily leverage posterior fusion to improve ASR accuracy and build a better …

被引用次数：57 相关文章所有 9 个版本

[PDF] arxiv.org

Minimum bayes risk training of rnn-transducer for end-to-end speech recognition

C Weng, C Yu, J Cui, C Zhang, D Yu - arXiv preprint arXiv:1911.12487, 2019 - arxiv.org

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for
end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR …

被引用次数：39 相关文章所有 7 个版本

[PDF] arxiv.org

Acoustically grounded word embeddings for improved acoustics-to-word speech recognition

S Settle, K Audhkhasi, K Livescu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are
simpler to train, and more efficient to decode with, than sub-word systems. However, A2W …

被引用次数：40 相关文章所有 6 个版本

[PDF] academia.edu

[PDF][PDF] Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

K Audhkhasi, G Saon, Z Tüske, B Kingsbury… - Interspeech, 2019 - academia.edu

Prior work has shown that connectionist temporal classification (CTC)-based automatic
speech recognition systems perform well when using bidirectional long short-term memory …

被引用次数：33 相关文章所有 6 个版本

[PDF] arxiv.org

Improved multi-stage training of online attention-based encoder-decoder models

A Garg, D Gowda, A Kumar, K Kim… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

In this paper, we propose a refined multi-stage multi-task training strategy to improve the
performance of onli ne attention-based encoder-decoder (AED) models. A three-stage …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm

S Ghorbani, AE Bulut… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Non-native speech causes automatic speech recognition systems to degrade in
performance. Past strategies to address this challenge have considered model adaptation …

被引用次数：21 相关文章所有 4 个版本

Distilling attention weights for CTC-based ASR systems

T Moriya, H Sato, T Tanaka, T Ashihara… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

We present a novel training approach for connectionist temporal classification (CTC)-based
automatic speech recognition (ASR) systems. CTC models are promising for building both a …

被引用次数：14 相关文章所有 2 个版本