Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding
J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …
safe and efficient air traffic control (ATC). The handling of these voice communications …
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing
The quadratic memory complexity of self-attention has generally restricted Transformer-
based models to utterance-based speech processing, preventing models from leveraging …
based models to utterance-based speech processing, preventing models from leveraging …
Hypermixer: An mlp-based low cost alternative to transformers
Transformer-based architectures are the model of choice for natural language
understanding, but they come at a significant cost, as they have quadratic complexity in the …
understanding, but they come at a significant cost, as they have quadratic complexity in the …
Open-Source Conversational AI with SpeechBrain 1.0
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused
particularly on speech processing tasks such as speech recognition, speech enhancement …
particularly on speech processing tasks such as speech recognition, speech enhancement …
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …
recognition tasks. However, their deployment poses challenges due to high computational …
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
S Kumar, S Madikeri, J Zuluaga-Gomez… - arXiv preprint arXiv …, 2024 - arxiv.org
Self-supervised pretrained models exhibit competitive performance in automatic speech
recognition on finetuning, even with limited in-domain supervised data for training. However …
recognition on finetuning, even with limited in-domain supervised data for training. However …
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Despite its widespread adoption as the prominent neural architecture, the Transformer has
spurred several independent lines of work to address its limitations. One such approach is …
spurred several independent lines of work to address its limitations. One such approach is …
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Conventional speech-to-text translation (ST) systems are trained on single-speaker
utterances, and they may not generalize to real-life scenarios where the audio contains …
utterances, and they may not generalize to real-life scenarios where the audio contains …
[HTML][HTML] End-to-end single-channel speaker-turn aware conversational speech translation
JPZ Gomez, Z Huang, X Niu, R Paturi, S Srinivasan… - 2023 - amazon.science
Conventional speech-to-text translation (ST) systems are trained on single-speaker
utterances, and they may not generalize to real-life scenarios where the audio contains …
utterances, and they may not generalize to real-life scenarios where the audio contains …