MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement

W Xu, Z Chen, Z Tan, S Lv, R Han… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
A typical neural speech enhancement (SE) approach mainly handles speech and noise
mixtures, which is not optimal for singing voice enhancement scenarios where singing is …

Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques

Y Özer, M Müller - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
In this work, we address the novel and rarely considered source separation task of
decomposing piano concerto recordings into separate piano and orchestral tracks. Being a …

Semi-supervised time domain target speaker extraction with attention

Z Wang, R Giri, S Venkataramani, U Isik… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It
consists of a pre-trained speaker embedder network and a separator network based on …

Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates

R Mestre, SE Middleton, M Ryan… - Findings of the …, 2023 - aclanthology.org
The integration of multimodality in natural language processing (NLP) tasks seeks to exploit
the complementary information contained in two or more modalities, such as text, audio and …

Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation

W Yuan, S Wang, J Wang, M Unoki… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Learning effective vocal representations from a waveform mixture is a crucial but
challenging task for deep neural network (DNN)-based singing voice separation (SVS) …

Audio deepfakes: feature extraction and model evaluation for detection

RK Bhukya, A Raj, DN Raja - 2024 5th International …, 2024 - ieeexplore.ieee.org
Cutting-edge AI-driven tools are currently employed for replicating human voices, leading to
the emergence of audio deepfakes. Initially designed to enhance experiences like audio …

Air Traffic Controller Fatigue Detection by Applying a Dual-Stream Convolutional Neural Network to the Fusion of Radiotelephony and Facial Data

L Xu, S Ma, Z Shen, Y Nan - Aerospace, 2024 - mdpi.com
The role of air traffic controllers is to direct and manage highly dynamic flights. Their work
requires both efficiency and accuracy. Previous studies have shown that fatigue in air traffic …

A study of audio mixing methods for piano transcription in violin-piano ensembles

H Kim, J Park, T Kwon, D Jeong… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
While piano music transcription models have shown high performance for solo piano
recordings, their performance de-grades when applied to ensemble recordings. This study …

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal …

F Li, Y Hu, L Wang - Sensors, 2023 - mdpi.com
Singing-voice separation is a separation task that involves a singing voice and musical
accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting …

An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification

W Yuan, Y Bian, S Wang, M Unoki… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Singing voice separation (SVS) and speaker identification (SI) are two classic problems in
speech signal processing. Deep neural networks (DNNs) solve these two problems by …