Spoken instruction understanding in air traffic control: Challenge, technique, and application

Y Lin - Aerospace, 2021 - mdpi.com
In air traffic control (ATC), speech communication with radio transmission is the primary way
to exchange information between the controller and aircrew. A wealth of contextual …

wav2vec: Unsupervised pre-training for speech recognition

S Schneider, A Baevski, R Collobert, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning

Z Lian, H Sun, L Sun, K Chen, M Xu, K Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
The first Multimodal Emotion Recognition Challenge (MER 2023) 1 was successfully held at
ACM Multimedia. The challenge focuses on system robustness and consists of three distinct …

Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition

Z Lian, B Liu, J Tao - IEEE Transactions on Affective Computing, 2022 - ieeexplore.ieee.org
Conversational emotion recognition is a crucial research topic in human-computer
interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting …

Multimodal cross-and self-attention network for speech emotion recognition

L Sun, B Liu, J Tao, Z Lian - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Speech Emotion Recognition (SER) requires a thorough understanding of both the linguistic
content of an utterance (ie, textual information) and how the speaker utters it (ie, acoustic …

Improving transformer-based speech recognition using unsupervised pre-training

D Jiang, X Lei, W Li, N Luo, Y Hu, W Zou… - arXiv preprint arXiv …, 2019 - arxiv.org
Speech recognition technologies are gaining enormous popularity in various industrial
applications. However, building a good speech recognition system usually requires large …

Contrastive unsupervised learning for speech emotion recognition

M Li, B Yang, J Levy, A Stolcke… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Speech emotion recognition (SER) is a key technology to enable more natural human-
machine communication. However, SER has long suffered from a lack of public large-scale …

Improving speech recognition models with small samples for air traffic control systems

Y Lin, Q Li, B Yang, Z Yan, H Tan, Z Chen - Neurocomputing, 2021 - Elsevier
In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech
recognition (ASR) model always faces the problem of small training samples since the …

Speech-XLNet: Unsupervised acoustic model pretraining for self-attention networks

X Song, G Wang, Z Wu, Y Huang, D Su, D Yu… - arXiv preprint arXiv …, 2019 - arxiv.org
Self-attention network (SAN) can benefit significantly from the bi-directional representation
learning through unsupervised pretraining paradigms such as BERT and XLNet. In this …

Tdfnet: Transformer-based deep-scale fusion network for multimodal emotion recognition

Z Zhao, Y Wang, G Shen, Y Xu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
As deep learning technology research continues to progress, artificial intelligence
technology is gradually empowering various fields. To achieve a more natural human …