Fastemit: Low-latency streaming asr with sequence-level emission regularization
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible. However, emitting fast without degrading quality, as …
quickly and accurately as possible. However, emitting fast without degrading quality, as …
Dual-mode ASR: Unify and improve streaming ASR with full-context modeling
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
Transformer based deliberation for two-pass speech recognition
Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …
accurate results. Two-pass models excel at these requirements by employing a first-pass …
Transformer 在语音识别任务中的研究现状与展望.
张晓旭, 马志强, 刘志强, 朱方圆… - Journal of Frontiers of …, 2021 - search.ebscohost.com
Transformer 作为一种新的深度学习算法框架, 得到了越来越多研究人员的关注,
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …
Learning word-level confidence for subword end-to-end ASR
We study the problem of word-level confidence estimation in subword-based end-to-end
(E2E) models for automatic speech recognition (ASR). Although prior works have proposed …
(E2E) models for automatic speech recognition (ASR). Although prior works have proposed …
TAPIR: Learning adaptive revision for incremental natural language understanding with a two-pass model
Language is by its very nature incremental in how it is produced and processed. This
property can be exploited by NLP systems to produce fast responses, which has been …
property can be exploited by NLP systems to produce fast responses, which has been …
Asr rescoring and confidence estimation with electra
In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors
should be selected from the n-best list using a language model (LM). However, LMs are …
should be selected from the n-best list using a language model (LM). However, LMs are …
Cross-attention conformer for context modeling in speech enhancement for ASR
A Narayanan, CC Chiu, T O'Malley… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
This work introduces cross-attention conformer, an attention-based architecture for context
modeling in speech enhancement. Given that the context information can often be …
modeling in speech enhancement. Given that the context information can often be …
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …
parse from speech have become more promising recently. This approach uses a single …
Scaling up deliberation for multilingual ASR
Multilingual end-to-end automatic speech recognition models are attractive due to its
simplicity in training and deployment. Recent work on large-scale training of such models …
simplicity in training and deployment. Recent work on large-scale training of such models …