Fastemit: Low-latency streaming asr with sequence-level emission regularization

J Yu, CC Chiu, B Li, S Chang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible. However, emitting fast without degrading quality, as …

Dual-mode ASR: Unify and improve streaming ASR with full-context modeling

J Yu, W Han, A Gulati, CC Chiu, B Li… - International …, 2021 - openreview.net
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …

Transformer based deliberation for two-pass speech recognition

K Hu, R Pang, TN Sainath… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …

Transformer 在语音识别任务中的研究现状与展望.

张晓旭, 马志强, 刘志强, 朱方圆… - Journal of Frontiers of …, 2021 - search.ebscohost.com
Transformer 作为一种新的深度学习算法框架, 得到了越来越多研究人员的关注,
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …

Learning word-level confidence for subword end-to-end ASR

D Qiu, Q Li, Y He, Y Zhang, B Li, L Cao… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We study the problem of word-level confidence estimation in subword-based end-to-end
(E2E) models for automatic speech recognition (ASR). Although prior works have proposed …

TAPIR: Learning adaptive revision for incremental natural language understanding with a two-pass model

P Kahardipraja, B Madureira, D Schlangen - arXiv preprint arXiv …, 2023 - arxiv.org
Language is by its very nature incremental in how it is produced and processed. This
property can be exploited by NLP systems to produce fast responses, which has been …

Asr rescoring and confidence estimation with electra

H Futami, H Inaguma, M Mimura… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors
should be selected from the n-best list using a language model (LM). However, LMs are …

Cross-attention conformer for context modeling in speech enhancement for ASR

A Narayanan, CC Chiu, T O'Malley… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
This work introduces cross-attention conformer, an attention-based architecture for context
modeling in speech enhancement. Given that the context information can often be …

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

Scaling up deliberation for multilingual ASR

K Hu, B Li, TN Sainath - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
Multilingual end-to-end automatic speech recognition models are attractive due to its
simplicity in training and deployment. Recent work on large-scale training of such models …