Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
Diarizationlm: Speaker diarization post-processing with large language models
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
Efficient Cascaded Streaming ASR System via Frame Rate Reduction
In this paper, we explore various frame rate reduction schemes on the two-pass cascaded
encoder model to improve its efficiency without scarifying the transcription quality. We …
encoder model to improve its efficiency without scarifying the transcription quality. We …
Personalized speech enhancement combining band-split rnn and speaker attentive module
X Le, L Chen, C He, Y Guo, C Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Target speaker information can be utilized in speech enhancement (SE) models to more
effectively extract the desired speech. Previous works introduce the speaker embedding into …
effectively extract the desired speech. Previous works introduce the speaker embedding into …
Svvad: Personal voice activity detection for speaker verification
Z Kang, J Wang, J Peng, J Xiao - arXiv preprint arXiv:2305.19581, 2023 - arxiv.org
Voice activity detection (VAD) improves the performance of speaker verification (SV) by
preserving speech segments and attenuating the effects of non-speech. However, this …
preserving speech segments and attenuating the effects of non-speech. However, this …
Conditional conformer: Improving speaker modulation for single and multi-user speech enhancement
T O'Malley, S Ding, A Narayanan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other
approaches to incorporate speaker embedding into speech separation and VoiceFilter …
approaches to incorporate speaker embedding into speech separation and VoiceFilter …
Version control of speaker recognition systems
This paper discusses one of the most challenging practical engineering problems in speaker
recognition systems—the version control of models and user profiles. A typical speaker …
recognition systems—the version control of models and user profiles. A typical speaker …
Enabling Hands-Free Voice Assistant Activation on Earphones
We present the design and implementation of EarVoice, a lightweight mobile service that
enables hands-free voice assistant activation on commodity earphones. EarVoice comprises …
enables hands-free voice assistant activation on commodity earphones. EarVoice comprises …
RadioVAD: mmWave-Based Noise and Interference-Resilient Voice Activity Detection
Voice interfaces have become one of the most ubiquitous human-computer interaction
methods in recent years. Voice Activity Detection (VAD) is typically the first building block of …
methods in recent years. Voice Activity Detection (VAD) is typically the first building block of …
Investigation of Speaker Representation for Target-Speaker Speech Processing
Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …