[HTML][HTML] Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

[HTML][HTML] A robust approach to multimodal deepfake detection

D Salvi, H Liu, S Mandelli, P Bestagini, W Zhou… - Journal of …, 2023 - mdpi.com
The widespread use of deep learning techniques for creating realistic synthetic media,
commonly known as deepfakes, poses a significant threat to individuals, organizations, and …

[HTML][HTML] A virtual simulation-pilot agent for training of air traffic controllers

J Zuluaga-Gomez, A Prasad, I Nigmatulina, P Motlicek… - Aerospace, 2023 - mdpi.com
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …

[HTML][HTML] Improving hybrid ctc/attention architecture for agglutinative language speech recognition

Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …

[HTML][HTML] Development of supervised speaker diarization system based on the pyannote audio processing library

V Khoma, Y Khoma, V Brydinskyi, A Konovalov - Sensors, 2023 - mdpi.com
Diarization is an important task when work with audiodata is executed, as it provides a
solution to the problem related to the need of dividing one analyzed call recording into …

[HTML][HTML] Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

A Moufidi, D Rousseau, P Rasti - Sensors, 2023 - mdpi.com
Multimodal deep learning, in the context of biometrics, encounters significant challenges
due to the dependence on long speech utterances and RGB images, which are often …

[HTML][HTML] Characterization of deep learning-based speech-enhancement techniques in online audio processing applications

C Rascon - Sensors, 2023 - mdpi.com
Deep learning-based speech-enhancement techniques have recently been an area of
growing interest, since their impressive performance can potentially benefit a wide variety of …

[HTML][HTML] Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network

J Huang, P Lu, S Sun, F Wang - Electronics, 2023 - mdpi.com
In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of
sentiments by fusing multimodal information, thereby enhancing the understanding of real …

[HTML][HTML] Self attention networks in speaker recognition

P Safari, M India, J Hernando - Applied Sciences, 2023 - mdpi.com
Recently, there has been a significant surge of interest in Self-Attention Networks (SANs)
based on the Transformer architecture. This can be attributed to their notable ability for …

[HTML][HTML] An assessment of in-the-wild datasets for multimodal emotion recognition

A Aguilera, D Mellado, F Rojas - Sensors, 2023 - mdpi.com
Multimodal emotion recognition implies the use of different resources and techniques for
identifying and recognizing human emotions. A variety of data sources such as faces …