[HTML][HTML] Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding
J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …
safe and efficient air traffic control (ATC). The handling of these voice communications …
[HTML][HTML] A robust approach to multimodal deepfake detection
The widespread use of deep learning techniques for creating realistic synthetic media,
commonly known as deepfakes, poses a significant threat to individuals, organizations, and …
commonly known as deepfakes, poses a significant threat to individuals, organizations, and …
[HTML][HTML] A virtual simulation-pilot agent for training of air traffic controllers
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …
[HTML][HTML] Improving hybrid ctc/attention architecture for agglutinative language speech recognition
Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …
information such as a pronunciation dictionary, and its system is built through a single neural …
[HTML][HTML] Development of supervised speaker diarization system based on the pyannote audio processing library
V Khoma, Y Khoma, V Brydinskyi, A Konovalov - Sensors, 2023 - mdpi.com
Diarization is an important task when work with audiodata is executed, as it provides a
solution to the problem related to the need of dividing one analyzed call recording into …
solution to the problem related to the need of dividing one analyzed call recording into …
[HTML][HTML] Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification
Multimodal deep learning, in the context of biometrics, encounters significant challenges
due to the dependence on long speech utterances and RGB images, which are often …
due to the dependence on long speech utterances and RGB images, which are often …
[HTML][HTML] Characterization of deep learning-based speech-enhancement techniques in online audio processing applications
C Rascon - Sensors, 2023 - mdpi.com
Deep learning-based speech-enhancement techniques have recently been an area of
growing interest, since their impressive performance can potentially benefit a wide variety of …
growing interest, since their impressive performance can potentially benefit a wide variety of …
[HTML][HTML] Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network
J Huang, P Lu, S Sun, F Wang - Electronics, 2023 - mdpi.com
In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of
sentiments by fusing multimodal information, thereby enhancing the understanding of real …
sentiments by fusing multimodal information, thereby enhancing the understanding of real …
[HTML][HTML] Self attention networks in speaker recognition
Recently, there has been a significant surge of interest in Self-Attention Networks (SANs)
based on the Transformer architecture. This can be attributed to their notable ability for …
based on the Transformer architecture. This can be attributed to their notable ability for …
[HTML][HTML] An assessment of in-the-wild datasets for multimodal emotion recognition
A Aguilera, D Mellado, F Rojas - Sensors, 2023 - mdpi.com
Multimodal emotion recognition implies the use of different resources and techniques for
identifying and recognizing human emotions. A variety of data sources such as faces …
identifying and recognizing human emotions. A variety of data sources such as faces …