Multitask detection of speaker changes, overlapping speech and voice activity using wav2vec 2.0

G Zhao, Y Wang, J Pelecanos, Y Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We introduce a multilingual speaker change detection model (USM-SCD) that can
simultaneously detect speaker turns and perform ASR for 96 languages. This model is …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

From Modular to End-to-End Speaker Diarization

F Landini - arXiv preprint arXiv:2407.08752, 2024 - arxiv.org

Speaker diarization is usually referred to as the task that determines``who spoke when''in a
recording. Until a few years ago, all competitive approaches were modular. Systems based …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

GIST-AiTeR speaker diarization system for voxceleb speaker recognition challenge (VoxSRC) 2023

D Park, JW Kim, KR Kim, DH Lee, HK Kim - arXiv preprint arXiv …, 2023 - arxiv.org

This report describes the submission system by the GIST-AiTeR team for the VoxCeleb
Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system …

被引用次数：2 相关文章所有 2 个版本

[PDF] aalto.fi

[PDF][PDF] Multi-task wav2vec2 serving as a pronunciation training system for children

Y Getman, R Al-Ghezi, T Grosz… - 9th Workshop on Speech …, 2023 - research.aalto.fi

Computer-assisted learning tools (CAPT) are increasingly reliant on AI tools. Recent studies
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Y Li, X Wang, L Zhang, L Xie - arXiv preprint arXiv:2406.08393, 2024 - arxiv.org

Speaker Change Detection (SCD) is to identify boundaries among speakers in a
conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task …

相关文章所有 2 个版本

[PDF] springer.com

Comparison of wav2vec 2.0 models on three speech processing tasks

M Kunešová, Z Zajíc, L Šmídl, M Karafiát - International Journal of Speech …, 2024 - Springer

The current state-of-the-art for various speech processing problems is a sequence-to-
sequence model based on a self-attention mechanism known as transformer. The widely …

Split-Attention CNN and Self-Attention with RoPE and GCN for Voice Activity Detection

YW Tan, XF Ding - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, attention-based voice activity detection systems have become popular,
attributed to their ability to encapsulate a diverse array of contextual information. The …

相关文章所有 2 个版本

[PDF] isca-archive.org

[PDF][PDF] Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models

H Su, Y Kong, L Fan, P Gao, Y Wang… - Proc. Interspeech …, 2024 - isca-archive.org

Abstract Speaker Change Detection (SCD) is an essential problem in speech processing
and has various applications in many fields. The self-supervised models have shown …

相关文章所有 2 个版本

Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers

S Kumar, S Madikeri, I Nigmatulina… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Traditionally, automatic speech recognition (ASR) and speaker change detection (SCD)
systems have been independently trained to generate comprehensive transcripts …

被引用次数：1 相关文章

[PDF] arxiv.org

Audio-Visual Approach For Multimodal Concurrent Speaker Detection

A Eliav, S Gannot - arXiv preprint arXiv:2407.01774, 2024 - arxiv.org

Concurrent Speaker Detection (CSD), the task of identifying the presence and overlap of
active speakers in an audio signal, is crucial for many audio tasks such as meeting …

相关文章所有 2 个版本