USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
We introduce a multilingual speaker change detection model (USM-SCD) that can
simultaneously detect speaker turns and perform ASR for 96 languages. This model is …
simultaneously detect speaker turns and perform ASR for 96 languages. This model is …
From Modular to End-to-End Speaker Diarization
F Landini - arXiv preprint arXiv:2407.08752, 2024 - arxiv.org
Speaker diarization is usually referred to as the task that determines``who spoke when''in a
recording. Until a few years ago, all competitive approaches were modular. Systems based …
recording. Until a few years ago, all competitive approaches were modular. Systems based …
GIST-AiTeR speaker diarization system for voxceleb speaker recognition challenge (VoxSRC) 2023
This report describes the submission system by the GIST-AiTeR team for the VoxCeleb
Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system …
Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system …
[PDF][PDF] Multi-task wav2vec2 serving as a pronunciation training system for children
Computer-assisted learning tools (CAPT) are increasingly reliant on AI tools. Recent studies
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Speaker Change Detection (SCD) is to identify boundaries among speakers in a
conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task …
conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task …
Comparison of wav2vec 2.0 models on three speech processing tasks
The current state-of-the-art for various speech processing problems is a sequence-to-
sequence model based on a self-attention mechanism known as transformer. The widely …
sequence model based on a self-attention mechanism known as transformer. The widely …
Split-Attention CNN and Self-Attention with RoPE and GCN for Voice Activity Detection
YW Tan, XF Ding - IEEE Access, 2024 - ieeexplore.ieee.org
In recent years, attention-based voice activity detection systems have become popular,
attributed to their ability to encapsulate a diverse array of contextual information. The …
attributed to their ability to encapsulate a diverse array of contextual information. The …
[PDF][PDF] Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models
H Su, Y Kong, L Fan, P Gao, Y Wang… - Proc. Interspeech …, 2024 - isca-archive.org
Abstract Speaker Change Detection (SCD) is an essential problem in speech processing
and has various applications in many fields. The self-supervised models have shown …
and has various applications in many fields. The self-supervised models have shown …
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers
Traditionally, automatic speech recognition (ASR) and speaker change detection (SCD)
systems have been independently trained to generate comprehensive transcripts …
systems have been independently trained to generate comprehensive transcripts …
Audio-Visual Approach For Multimodal Concurrent Speaker Detection
Concurrent Speaker Detection (CSD), the task of identifying the presence and overlap of
active speakers in an audio signal, is crucial for many audio tasks such as meeting …
active speakers in an audio signal, is crucial for many audio tasks such as meeting …