[HTML][HTML] Spoken instruction understanding in air traffic control: Challenge, technique, and application

Y Lin - Aerospace, 2021 - mdpi.com
In air traffic control (ATC), speech communication with radio transmission is the primary way
to exchange information between the controller and aircrew. A wealth of contextual …

A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition

Y Shi, Y Wang, C Wu, CF Yeh, J Chan… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …

Transformer-based acoustic modeling for hybrid speech recognition

Y Wang, A Mohamed, D Le, C Liu… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech
recognition. Several modeling choices are discussed in this work, including various …

[PDF][PDF] Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration

T Nakatani - proc. INTERSPEECH, 2019 - isca-archive.org
The state-of-the-art neural network architecture named Transformer has been used
successfully for many sequence-tosequence transformation tasks. The advantage of this …

Root mean square layer normalization

B Zhang, R Sennrich - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Layer normalization (LayerNorm) has been successfully applied to various deep neural
networks to help stabilize training and boost model convergence because of its capability in …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

[HTML][HTML] Thank you for attention: a survey on attention-based artificial neural networks for automatic speech recognition

P Karmakar, SW Teng, G Lu - Intelligent Systems with Applications, 2024 - Elsevier
Attention is a very popular and effective mechanism in artificial neural network-based
sequence-to-sequence models. In this survey paper, a comprehensive review of the different …