Speech processing for robust speaker recognition: Analysis and advancements for whispered speech

JHL Hansen, C Zhang, X Fan - Forensic Speaker Recognition: Law …, 2012 - Springer
Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism, 2012Springer
In the field of voice forensics, the ability to perform effective speaker recognition from input
audio streams is an important task. However, in many situations, individuals may prefer to
lower their risk of being heard in public settings via whisper mode during communications. It
is in precisely these conditions that speaker recognition should remain effective. Limited
formal research has been performed in this domain to date. Whisper is an alternative speech
production mode used by subjects in public conversation to protect content privacy or …
Abstract
In the field of voice forensics, the ability to perform effective speaker recognition from input audio streams is an important task. However, in many situations, individuals may prefer to lower their risk of being heard in public settings via whisper mode during communications. It is in precisely these conditions that speaker recognition should remain effective. Limited formal research has been performed in this domain to date. Whisper is an alternative speech production mode used by subjects in public conversation to protect content privacy or identity. Due to the profound differences between whisper and neutral speech in terms of spectral structure, the performance of speaker identification systems trained with neutral speech degrade significantly. In this chapter, studies that address acoustic analysis of whisper will be reviewed. Next, an effective data collection procedure for both spontaneous and read whisper speech will be introduced. An algorithm for whisper speech detection, which is a crucial front-end for whisper speech processing algorithms, will be presented. Finally, a seamless neutral/whisper mismatched closed-set speaker recognition system will be introduced. In the evaluation, a traditional MFCC-GMM system is employed as the baseline speaker ID system. An analysis of both speaker and phoneme variability in speaker ID performance using neutral trained GMMs is provided, which forms the basis for a final combined whisper based speaker ID system is presented. Experimental results are also provided followed by directions for future work.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果