End-to-End ASR with Adaptive Span Self-Attention.

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc

The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

被引用次数：75 相关文章所有 7 个版本

[PDF] openreview.net

Understanding the role of self attention for efficient speech recognition

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net

Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arXiv preprint arXiv:2212.07850, 2022 - arxiv.org

The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

被引用次数：12 相关文章所有 6 个版本

[HTML] mit.edu

[HTML][HTML] Direct speech translation for automatic subtitling

S Papi, M Gaido, A Karakanta, M Cettolo… - Transactions of the …, 2023 - direct.mit.edu

Automatic subtitling is the task of automatically translating the speech of audiovisual content
into short pieces of timed text, ie, subtitles and their corresponding timestamps. The …

被引用次数：5 相关文章所有 11 个版本

[PDF] mlr.press

Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers

G Strimel, Y Xie, BJ King, M Radfar… - International …, 2023 - proceedings.mlr.press

Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Deep sparse conformer for speech recognition

X Wu - arXiv preprint arXiv:2209.00260, 2022 - arxiv.org

Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by
leveraging transformer's capturing of content-based global interactions and convolutional …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Attention Enhanced Citrinet for Speech Recognition

X Wu - arXiv preprint arXiv:2209.00261, 2022 - arxiv.org

Citrinet is an end-to-end convolutional Connectionist Temporal Classification (CTC) based
automatic speech recognition (ASR) model. To capture local and global contextual …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Conversation-oriented asr with multi-look-ahead cbs architecture

H Zhao, S Fujie, T Ogawa, J Sakuma… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

During conversations, humans are capable of inferring the intention of the speaker at any
point of the speech to prepare the following action promptly. Such ability is also the key for …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

An investigation of enhancing CTC model for triggered attention-based streaming ASR

H Zhao, Y Higuchi, T Ogawa… - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention
mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

The evaluation of a code-switched Sepedi-English automatic speech recognition system

A Phaladi, T Modipa - arXiv preprint arXiv:2403.07947, 2024 - arxiv.org

Speech technology is a field that encompasses various techniques and tools used to enable
machines to interact with speech, such as automatic speech recognition (ASR), spoken …