Parallel rescoring with transformer for streaming on-device speech recognition

J Yu, CC Chiu, B Li, S Chang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible. However, emitting fast without degrading quality, as …

被引用次数：93 相关文章所有 6 个版本

[PDF] openreview.net

Dual-mode ASR: Unify and improve streaming ASR with full-context modeling

J Yu, W Han, A Gulati, CC Chiu, B Li… - International …, 2021 - openreview.net

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …

被引用次数：79 相关文章所有 9 个版本

[PDF] arxiv.org

Transformer based deliberation for two-pass speech recognition

K Hu, R Pang, TN Sainath… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …

被引用次数：38 相关文章所有 5 个版本

[PDF] archive.org

Transformer 在语音识别任务中的研究现状与展望.

张晓旭，马志强，刘志强，朱方圆… - Journal of Frontiers of …, 2021 - search.ebscohost.com

Transformer 作为一种新的深度学习算法框架, 得到了越来越多研究人员的关注,
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Learning word-level confidence for subword end-to-end ASR

D Qiu, Q Li, Y He, Y Zhang, B Li, L Cao… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

We study the problem of word-level confidence estimation in subword-based end-to-end
(E2E) models for automatic speech recognition (ASR). Although prior works have proposed …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

TAPIR: Learning adaptive revision for incremental natural language understanding with a two-pass model

P Kahardipraja, B Madureira, D Schlangen - arXiv preprint arXiv …, 2023 - arxiv.org

Language is by its very nature incremental in how it is produced and processed. This
property can be exploited by NLP systems to produce fast responses, which has been …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Asr rescoring and confidence estimation with electra

H Futami, H Inaguma, M Mimura… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors
should be selected from the n-best list using a language model (LM). However, LMs are …

被引用次数：21 相关文章所有 6 个版本

[PDF] arxiv.org

Cross-attention conformer for context modeling in speech enhancement for ASR

A Narayanan, CC Chiu, T O'Malley… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

This work introduces cross-attention conformer, an attention-based architecture for context
modeling in speech enhancement. Given that the context information can often be …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Scaling up deliberation for multilingual ASR

K Hu, B Li, TN Sainath - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org

Multilingual end-to-end automatic speech recognition models are attractive due to its
simplicity in training and deployment. Recent work on large-scale training of such models …

被引用次数：7 相关文章所有 3 个版本