Spoken instruction understanding in air traffic control: Challenge, technique, and application
Y Lin - Aerospace, 2021 - mdpi.com
In air traffic control (ATC), speech communication with radio transmission is the primary way
to exchange information between the controller and aircrew. A wealth of contextual …
to exchange information between the controller and aircrew. A wealth of contextual …
wav2vec: Unsupervised pre-training for speech recognition
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning
The first Multimodal Emotion Recognition Challenge (MER 2023) 1 was successfully held at
ACM Multimedia. The challenge focuses on system robustness and consists of three distinct …
ACM Multimedia. The challenge focuses on system robustness and consists of three distinct …
Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition
Conversational emotion recognition is a crucial research topic in human-computer
interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting …
interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting …
Multimodal cross-and self-attention network for speech emotion recognition
Speech Emotion Recognition (SER) requires a thorough understanding of both the linguistic
content of an utterance (ie, textual information) and how the speaker utters it (ie, acoustic …
content of an utterance (ie, textual information) and how the speaker utters it (ie, acoustic …
Improving transformer-based speech recognition using unsupervised pre-training
Speech recognition technologies are gaining enormous popularity in various industrial
applications. However, building a good speech recognition system usually requires large …
applications. However, building a good speech recognition system usually requires large …
Contrastive unsupervised learning for speech emotion recognition
Speech emotion recognition (SER) is a key technology to enable more natural human-
machine communication. However, SER has long suffered from a lack of public large-scale …
machine communication. However, SER has long suffered from a lack of public large-scale …
Improving speech recognition models with small samples for air traffic control systems
In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech
recognition (ASR) model always faces the problem of small training samples since the …
recognition (ASR) model always faces the problem of small training samples since the …
Speech-XLNet: Unsupervised acoustic model pretraining for self-attention networks
Self-attention network (SAN) can benefit significantly from the bi-directional representation
learning through unsupervised pretraining paradigms such as BERT and XLNet. In this …
learning through unsupervised pretraining paradigms such as BERT and XLNet. In this …
Tdfnet: Transformer-based deep-scale fusion network for multimodal emotion recognition
Z Zhao, Y Wang, G Shen, Y Xu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
As deep learning technology research continues to progress, artificial intelligence
technology is gradually empowering various fields. To achieve a more natural human …
technology is gradually empowering various fields. To achieve a more natural human …