A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
End-to-end speaker segmentation for overlap-aware resegmentation
Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …
Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario
I Medennikov, M Korenevsky, T Prisyach… - arXiv preprint arXiv …, 2020 - arxiv.org
Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios
The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
Powerset multi-class cross entropy loss for neural speaker diarization
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …
has been addressing speaker diarization as a frame-wise multi-label classification problem …