A large-scale open-source acoustic simulator for speaker recognition

M Ferras, S Madikeri, P Motlicek, S Dey… - IEEE Signal …, 2016 - ieeexplore.ieee.org
IEEE Signal Processing Letters, 2016ieeexplore.ieee.org
The state-of-the-art speaker-recognition systems suffer from significant performance loss on
degraded speech conditions and acoustic mismatch between enrolment and test phases.
Past international evaluation campaigns, such as the NIST speaker recognition evaluation
(SRE), have partly addressed these challenges in some evaluation conditions. This work
aims at further assessing and compensating for the effect of a wide variety of speech-
degradation processes on speaker-recognition performance. We present an open-source …
The state-of-the-art speaker-recognition systems suffer from significant performance loss on degraded speech conditions and acoustic mismatch between enrolment and test phases. Past international evaluation campaigns, such as the NIST speaker recognition evaluation (SRE), have partly addressed these challenges in some evaluation conditions. This work aims at further assessing and compensating for the effect of a wide variety of speech-degradation processes on speaker-recognition performance. We present an open-source simulator generating degraded telephone, VoIP, and interview-speech recordings using a comprehensive list of narrow-band, wide-band, and audio codecs, together with a database of over 60 h of environmental noise recordings and over 100 impulse responses collected from publicly available data. We provide speaker-verification results obtained with an i-vector-based system using either a clean or degraded PLDA back-end on a NIST SRE subset of data corrupted by the proposed simulator. While error rates increase considerably under degraded speech conditions, large relative equal error rate (EER) reductions were observed when using a PLDA model trained with a large number of degraded sessions per speaker.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果