USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

G Zhao, Y Wang, J Pelecanos, Y Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a multilingual speaker change detection model (USM-SCD) that can
simultaneously detect speaker turns and perform ASR for 96 languages. This model is …

From Modular to End-to-End Speaker Diarization

F Landini - arXiv preprint arXiv:2407.08752, 2024 - arxiv.org
Speaker diarization is usually referred to as the task that determines``who spoke when''in a
recording. Until a few years ago, all competitive approaches were modular. Systems based …

GIST-AiTeR speaker diarization system for voxceleb speaker recognition challenge (VoxSRC) 2023

D Park, JW Kim, KR Kim, DH Lee, HK Kim - arXiv preprint arXiv …, 2023 - arxiv.org
This report describes the submission system by the GIST-AiTeR team for the VoxCeleb
Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system …

[PDF][PDF] Multi-task wav2vec2 serving as a pronunciation training system for children

Y Getman, R Al-Ghezi, T Grosz… - 9th Workshop on Speech …, 2023 - research.aalto.fi
Computer-assisted learning tools (CAPT) are increasingly reliant on AI tools. Recent studies
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Y Li, X Wang, L Zhang, L Xie - arXiv preprint arXiv:2406.08393, 2024 - arxiv.org
Speaker Change Detection (SCD) is to identify boundaries among speakers in a
conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task …

Comparison of wav2vec 2.0 models on three speech processing tasks

M Kunešová, Z Zajíc, L Šmídl, M Karafiát - International Journal of Speech …, 2024 - Springer
The current state-of-the-art for various speech processing problems is a sequence-to-
sequence model based on a self-attention mechanism known as transformer. The widely …

Split-Attention CNN and Self-Attention with RoPE and GCN for Voice Activity Detection

YW Tan, XF Ding - IEEE Access, 2024 - ieeexplore.ieee.org
In recent years, attention-based voice activity detection systems have become popular,
attributed to their ability to encapsulate a diverse array of contextual information. The …

[PDF][PDF] Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models

H Su, Y Kong, L Fan, P Gao, Y Wang… - Proc. Interspeech …, 2024 - isca-archive.org
Abstract Speaker Change Detection (SCD) is an essential problem in speech processing
and has various applications in many fields. The self-supervised models have shown …

Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers

S Kumar, S Madikeri, I Nigmatulina… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Traditionally, automatic speech recognition (ASR) and speaker change detection (SCD)
systems have been independently trained to generate comprehensive transcripts …

Audio-Visual Approach For Multimodal Concurrent Speaker Detection

A Eliav, S Gannot - arXiv preprint arXiv:2407.01774, 2024 - arxiv.org
Concurrent Speaker Detection (CSD), the task of identifying the presence and overlap of
active speakers in an audio signal, is crucial for many audio tasks such as meeting …