Ensemble deep learning in speech signal tasks: a review
Abstract Machine learning methods are extensively used for processing and analysing
speech signals by virtue of their performance gains over multiple domains. Deep learning …
speech signals by virtue of their performance gains over multiple domains. Deep learning …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
[HTML][HTML] A survey of sound source localization with deep learning methods
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …
localization, with a focus on sound source localization in indoor environments, where …
The speakin system for voxceleb speaker recognition challange 2021
M Zhao, Y Ma, M Liu, M Xu - arXiv preprint arXiv:2109.01989, 2021 - arxiv.org
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
ECAPA-TDNN embeddings for speaker diarization
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …
networks can accurately capture speaker discriminative characteristics and popular deep …
M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe
H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
Turn-to-diarize: Online speaker diarization constrained by transformer transducer speaker turn detection
In this paper, we present a novel speaker diarization system for streaming on-device
applications. In this system, we use a transformer transducer to detect the speaker turns …
applications. In this system, we use a transformer transducer to detect the speaker turns …