A large-scale open-source acoustic simulator for speaker recognition

X Chen, Y Zhang, G Zhu, Z Duan - arXiv preprint arXiv:2107.12018, 2021 - arxiv.org

In this paper, we present UR-AIR system submission to the logical access (LA) and the
speech deepfake (DF) tracks of the ASVspoof 2021 Challenge. The LA and DF tasks focus …

被引用次数：50 相关文章所有 10 个版本

[PDF] arxiv.org

A study on data augmentation in voice anti-spoofing

A Cohen, I Rimon, E Aflalo, HH Permuter - Speech Communication, 2022 - Elsevier

In this paper we perform an in depth study of how data augmentation techniques improve
synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel …

被引用次数：38 相关文章所有 6 个版本

Domain generalization via aggregation and separation for audio deepfake detection

Y Xie, H Cheng, Y Wang, L Ye - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In this paper, we propose an Aggregation and Separation Domain Generalization (ASDG)
method for Audio DeepFake Detection (ADD). Fake speech generated from different …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

An empirical study on channel effects for synthetic voice spoofing countermeasure systems

Y Zhang, G Zhu, F Jiang, Z Duan - arXiv preprint arXiv:2104.01320, 2021 - arxiv.org

Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to
discern spoofing attacks from bona fide speech trials. In practice, however, acoustic …

被引用次数：32 相关文章所有 15 个版本

[PDF] arxiv.org

Device-robust acoustic scene classification via impulse response augmentation

T Morocutti, F Schmid, K Koutini… - 2023 31st European …, 2023 - ieeexplore.ieee.org

The ability to generalize to a wide range of recording devices is a crucial performance factor
for audio classification models. The characteristics of different types of microphones …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Investigations on end-to-end audiovisual fusion

M Wand, J Schmidhuber, NT Vu - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise
in the acoustic signal. Leveraging recent developments in deep neural network-based …

被引用次数：42 相关文章所有 5 个版本

[PDF] arxiv.org

DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition

Z Guo, C Chen, ES Chng - arXiv preprint arXiv:2208.00987, 2022 - arxiv.org

The performances of automatic speech recognition (ASR) systems degrade drastically under
noisy conditions. Explicit distortion modelling (EDM), as a feature compensation step, is able …

被引用次数：6 相关文章所有 6 个版本

A fused speech enhancement framework for robust speaker verification

Y Wu, T Li, J Zhao, Q Wang, J Xu - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org

Robust speaker verification (RSV) under noisy conditions is still a challenging task.
Recently, some task-specific speech enhancement (SE) approaches are proposed and …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Investigations on audiovisual emotion recognition in noisy conditions

M Neumann, NT Vu - 2021 IEEE Spoken Language …, 2021 - ieeexplore.ieee.org

In this paper we explore audiovisual emotion recognition under noisy acoustic conditions
with a focus on speech features. We attempt to answer the following research questions:(i) …

被引用次数：10 相关文章所有 3 个版本

[PDF] apsipa.org

Audio codec simulation based data augmentation for telephony speech recognition

TL Vu, Z Zeng, H Xu, ES Chng - 2019 Asia-Pacific Signal and …, 2019 - ieeexplore.ieee.org

Real telephony speech recognition task is challenging due to 1) diversified channel
distortions and 2) limited access to the real data because of the data privacy consideration …

被引用次数：13 相关文章所有 2 个版本