A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition

Y Shi, Y Wang, C Wu, CF Yeh, J Chan… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …

[HTML][HTML] Thank you for attention: a survey on attention-based artificial neural networks for automatic speech recognition

P Karmakar, SW Teng, G Lu - Intelligent Systems with Applications, 2024 - Elsevier
Attention is a very popular and effective mechanism in artificial neural network-based
sequence-to-sequence models. In this survey paper, a comprehensive review of the different …

[HTML][HTML] A study of transformer-based end-to-end speech recognition system for Kazakh language

M Orken, O Dina, A Keylan, T Tolganay, O Mohamed - Scientific reports, 2022 - nature.com
Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …

Understanding the role of self attention for efficient speech recognition

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net
Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …

Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection

M Han, L Dong, Z Liang, M Cai, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …

Streaming transformer-based acoustic models using self-attention with augmented memory

C Wu, Y Wang, Y Shi, CF Yeh, F Zhang - arXiv preprint arXiv:2005.08042, 2020 - arxiv.org
Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and
sequence-to-sequence speech recogni-tion. However, it requires access to the full …

Privacy-preserving speech emotion recognition through semi-supervised federated learning

V Tsouvalas, T Ozcelebi… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Speech Emotion Recognition (SER) refers to the recognition of human emotions from
natural speech. If done accurately, it can offer a number of benefits in building human …

Transformer 在语音识别任务中的研究现状与展望.

张晓旭, 马志强, 刘志强, 朱方圆… - Journal of Frontiers of …, 2021 - search.ebscohost.com
Transformer 作为一种新的深度学习算法框架, 得到了越来越多研究人员的关注,
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …

Tiny transformers for environmental sound classification at the edge

D Elliott, CE Otero, S Wyatt, E Martino - arXiv preprint arXiv:2103.12157, 2021 - arxiv.org
With the growth of the Internet of Things and the rise of Big Data, data processing and
machine learning applications are being moved to cheap and low size, weight, and power …