Personal VAD 2.0: Optimizing personal voice activity detection for on-device speech recognition

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Diarizationlm: Speaker diarization post-processing with large language models

Q Wang, Y Huang, G Zhao, E Clark, W Xia… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …

被引用次数：15 相关文章所有 3 个版本

[PDF] bruguier.com

Efficient Cascaded Streaming ASR System via Frame Rate Reduction

X Cai, D Qiu, S Ding, D Hwang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In this paper, we explore various frame rate reduction schemes on the two-pass cascaded
encoder model to improve its efficiency without scarifying the transcription quality. We …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Personalized speech enhancement combining band-split rnn and speaker attentive module

X Le, L Chen, C He, Y Guo, C Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Target speaker information can be utilized in speech enhancement (SE) models to more
effectively extract the desired speech. Previous works introduce the speaker embedding into …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Svvad: Personal voice activity detection for speaker verification

Z Kang, J Wang, J Peng, J Xiao - arXiv preprint arXiv:2305.19581, 2023 - arxiv.org

Voice activity detection (VAD) improves the performance of speaker verification (SV) by
preserving speech segments and attenuating the effects of non-speech. However, this …

被引用次数：2 相关文章所有 5 个版本

Conditional conformer: Improving speaker modulation for single and multi-user speech enhancement

T O'Malley, S Ding, A Narayanan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other
approaches to incorporate speaker embedding into speech separation and VoiceFilter …

被引用次数：3 相关文章

[PDF] arxiv.org

Version control of speaker recognition systems

Q Wang, IL Moreno - Journal of Systems and Software, 2024 - Elsevier

This paper discusses one of the most challenging practical engineering problems in speaker
recognition systems—the version control of models and user profiles. A typical speaker …

被引用次数：12 相关文章所有 4 个版本

[PDF] acm.org

Enabling Hands-Free Voice Assistant Activation on Earphones

T Chen, Y Yang, C Qiu, X Fan, X Guo… - Proceedings of the 22nd …, 2024 - dl.acm.org

We present the design and implementation of EarVoice, a lightweight mobile service that
enables hands-free voice assistant activation on commodity earphones. EarVoice comprises …

被引用次数：2 相关文章

RadioVAD: mmWave-Based Noise and Interference-Resilient Voice Activity Detection

MZ Ozturk, C Wu, B Wang, M Wu… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org

Voice interfaces have become one of the most ubiquitous human-computer interaction
methods in recent years. Voice Activity Detection (VAD) is typically the first building block of …

被引用次数：2 相关文章

[PDF] arxiv.org

Investigation of Speaker Representation for Target-Speaker Speech Processing

T Ashihara, T Moriya, S Horiguchi, J Peng… - arXiv preprint arXiv …, 2024 - arxiv.org

Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …