4D ASR: Joint modeling of CTC, attention, transducer, and mask-predict decoders

[PDF][PDF] Time-synchronous one-pass beam search for parallel online and offline transducers with dynamic block training

Y Sudo, M Shakeel, Y Peng… - Proc. INTERSPEECH …, 2023 - researchgate.net

End-to-end automatic speech recognition (ASR) has become an increasingly popular area
of research, with two main models being online and offline ASR. Online models aim to …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search

Y Sudo, M Shakeel, Y Fukumoto… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

End-to-end (E2E) automatic speech recognition (ASR) methods exhibit remarkable
performance. However, since the performance of such methods is intrinsically linked to the …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

M Shakeel, Y Sudo, Y Peng, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …

相关文章所有 2 个版本

[PDF] arxiv.org

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Y Sudo, M Shakeel, Y Fukumoto, B Yan, J Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

End-to-end automatic speech recognition (E2E-ASR) can be classified into several network
architectures, such as connectionist temporal classification (CTC), recurrent neural network …

相关文章所有 2 个版本

[PDF] arxiv.org

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives

D Raj - arXiv preprint arXiv:2402.08932, 2024 - arxiv.org

Since the first speech recognition systems were built more than 30 years ago, improvement
in voice technology has enabled applications such as smart assistants and automated …

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

T Wang, X Xie, Z Li, S Hu, Z Jing, J Deng, M Cui… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask
Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR …

[PDF] arxiv.org

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

M Shakeel, Y Sudo, Y Peng, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

End-to-end (E2E) automatic speech recognition (ASR) can operate in two modes: streaming
and non-streaming, each with its pros and cons. Streaming ASR processes the speech …

相关文章所有 2 个版本

[PDF] researchgate.net

Online adaptation of fourier series-based acoustic transfer function model and its application to sound source localization and separation

Y Sudo, M Takigahira, H Tsuru, K Nakadai… - Advanced …, 2024 - Taylor & Francis

In this paper, we propose an online adaptation method for Fourier series-based acoustic
transfer function (FS-ATF) models for robot audition systems using microphone array signal …

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Y Sudo, Y Fukumoto, M Shakeel, Y Peng… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep biasing (DB) improves the performance of end-to-end automatic speech recognition
(E2E-ASR) for rare words or contextual phrases using a bias list. However, most existing …

相关文章所有 2 个版本

Improving Noise Robustness of Automatic Speech Recognition Based on a Parallel Adapter Model with Near-Identity Initialization

T Osaki, Y Sudo, K Itoyama, K Nishida… - … Conference on Industrial …, 2024 - Springer

This paper proposes the parallel adapter model (PAM) to improve the noise-robustness of
automatic speech recognition (ASR) systems with a small amount of retraining. The …

相关文章所有 2 个版本