HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

被引用次数：6 相关文章所有 15 个版本

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing

W Chen, T Kano, A Ogawa, M Delcroix… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The quadratic memory complexity of self-attention has generally restricted Transformer-
based models to utterance-based speech processing, preventing models from leveraging …

被引用次数：1 相关文章

[PDF] arxiv.org

Hypermixer: An mlp-based low cost alternative to transformers

F Mai, A Pannatier, F Fehr, H Chen, F Marelli… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer-based architectures are the model of choice for natural language
understanding, but they come at a significant cost, as they have quadratic complexity in the …

被引用次数：5 相关文章所有 11 个版本

[PDF] arxiv.org

Open-Source Conversational AI with SpeechBrain 1.0

M Ravanelli, T Parcollet, A Moumen… - arXiv preprint arXiv …, 2024 - arxiv.org

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused
particularly on speech processing tasks such as speech recognition, speech enhancement …

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

J Wang, Z Liang, X Zhang, N Cheng, J Xiao - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …

XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

S Kumar, S Madikeri, J Zuluaga-Gomez… - arXiv preprint arXiv …, 2024 - arxiv.org

Self-supervised pretrained models exhibit competitive performance in automatic speech
recognition on finetuning, even with limited in-domain supervised data for training. However …

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations

S Yadav, ZH Tan - arXiv preprint arXiv:2406.02178, 2024 - arxiv.org

Despite its widespread adoption as the prominent neural architecture, the Transformer has
spurred several independent lines of work to address its limitations. One such approach is …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

J Zuluaga-Gomez, Z Huang, X Niu, R Paturi… - arXiv preprint arXiv …, 2023 - arxiv.org

Conventional speech-to-text translation (ST) systems are trained on single-speaker
utterances, and they may not generalize to real-life scenarios where the audio contains …

被引用次数：2 相关文章所有 7 个版本

[HTML] amazon.science

[HTML][HTML] End-to-end single-channel speaker-turn aware conversational speech translation

JPZ Gomez, Z Huang, X Niu, R Paturi, S Srinivasan… - 2023 - amazon.science

Conventional speech-to-text translation (ST) systems are trained on single-speaker
utterances, and they may not generalize to real-life scenarios where the audio contains …