Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Diarizationlm: Speaker diarization post-processing with large language models

Q Wang, Y Huang, G Zhao, E Clark, W Xia… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …

Efficient Cascaded Streaming ASR System via Frame Rate Reduction

X Cai, D Qiu, S Ding, D Hwang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this paper, we explore various frame rate reduction schemes on the two-pass cascaded
encoder model to improve its efficiency without scarifying the transcription quality. We …

Personalized speech enhancement combining band-split rnn and speaker attentive module

X Le, L Chen, C He, Y Guo, C Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Target speaker information can be utilized in speech enhancement (SE) models to more
effectively extract the desired speech. Previous works introduce the speaker embedding into …

Svvad: Personal voice activity detection for speaker verification

Z Kang, J Wang, J Peng, J Xiao - arXiv preprint arXiv:2305.19581, 2023 - arxiv.org
Voice activity detection (VAD) improves the performance of speaker verification (SV) by
preserving speech segments and attenuating the effects of non-speech. However, this …

Conditional conformer: Improving speaker modulation for single and multi-user speech enhancement

T O'Malley, S Ding, A Narayanan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other
approaches to incorporate speaker embedding into speech separation and VoiceFilter …

Version control of speaker recognition systems

Q Wang, IL Moreno - Journal of Systems and Software, 2024 - Elsevier
This paper discusses one of the most challenging practical engineering problems in speaker
recognition systems—the version control of models and user profiles. A typical speaker …

Enabling Hands-Free Voice Assistant Activation on Earphones

T Chen, Y Yang, C Qiu, X Fan, X Guo… - Proceedings of the 22nd …, 2024 - dl.acm.org
We present the design and implementation of EarVoice, a lightweight mobile service that
enables hands-free voice assistant activation on commodity earphones. EarVoice comprises …

RadioVAD: mmWave-Based Noise and Interference-Resilient Voice Activity Detection

MZ Ozturk, C Wu, B Wang, M Wu… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org
Voice interfaces have become one of the most ubiquitous human-computer interaction
methods in recent years. Voice Activity Detection (VAD) is typically the first building block of …

Investigation of Speaker Representation for Target-Speaker Speech Processing

T Ashihara, T Moriya, S Horiguchi, J Peng… - arXiv preprint arXiv …, 2024 - arxiv.org
Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …