UR channel-robust synthetic speech detection system for ASVspoof 2021

X Chen, Y Zhang, G Zhu, Z Duan - arXiv preprint arXiv:2107.12018, 2021 - arxiv.org
In this paper, we present UR-AIR system submission to the logical access (LA) and the
speech deepfake (DF) tracks of the ASVspoof 2021 Challenge. The LA and DF tasks focus …

A study on data augmentation in voice anti-spoofing

A Cohen, I Rimon, E Aflalo, HH Permuter - Speech Communication, 2022 - Elsevier
In this paper we perform an in depth study of how data augmentation techniques improve
synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel …

Domain generalization via aggregation and separation for audio deepfake detection

Y Xie, H Cheng, Y Wang, L Ye - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this paper, we propose an Aggregation and Separation Domain Generalization (ASDG)
method for Audio DeepFake Detection (ADD). Fake speech generated from different …

An empirical study on channel effects for synthetic voice spoofing countermeasure systems

Y Zhang, G Zhu, F Jiang, Z Duan - arXiv preprint arXiv:2104.01320, 2021 - arxiv.org
Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to
discern spoofing attacks from bona fide speech trials. In practice, however, acoustic …

Device-robust acoustic scene classification via impulse response augmentation

T Morocutti, F Schmid, K Koutini… - 2023 31st European …, 2023 - ieeexplore.ieee.org
The ability to generalize to a wide range of recording devices is a crucial performance factor
for audio classification models. The characteristics of different types of microphones …

Investigations on end-to-end audiovisual fusion

M Wand, J Schmidhuber, NT Vu - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise
in the acoustic signal. Leveraging recent developments in deep neural network-based …

DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition

Z Guo, C Chen, ES Chng - arXiv preprint arXiv:2208.00987, 2022 - arxiv.org
The performances of automatic speech recognition (ASR) systems degrade drastically under
noisy conditions. Explicit distortion modelling (EDM), as a feature compensation step, is able …

A fused speech enhancement framework for robust speaker verification

Y Wu, T Li, J Zhao, Q Wang, J Xu - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Robust speaker verification (RSV) under noisy conditions is still a challenging task.
Recently, some task-specific speech enhancement (SE) approaches are proposed and …

Investigations on audiovisual emotion recognition in noisy conditions

M Neumann, NT Vu - 2021 IEEE Spoken Language …, 2021 - ieeexplore.ieee.org
In this paper we explore audiovisual emotion recognition under noisy acoustic conditions
with a focus on speech features. We attempt to answer the following research questions:(i) …

Audio codec simulation based data augmentation for telephony speech recognition

TL Vu, Z Zeng, H Xu, ES Chng - 2019 Asia-Pacific Signal and …, 2019 - ieeexplore.ieee.org
Real telephony speech recognition task is challenging due to 1) diversified channel
distortions and 2) limited access to the real data because of the data privacy consideration …