Progressive disentangled representation learning for fine-grained controllable talking head synthesis
We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …
Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review
Conventionally, the extraction of hand-crafted acoustic features has been separated from the
task of establishing robust machine-learning models in speech processing. The manual …
task of establishing robust machine-learning models in speech processing. The manual …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …
pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe
H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification
The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …
approaches with a large-scale, manually annotated dataset. However, collecting a …
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Training robust speaker verification systems without speaker labels has long been a
challenging task. Previous studies observed a large performance gap between self …
challenging task. Previous studies observed a large performance gap between self …
Improved deepfake detection using whisper features
With a recent influx of voice generation methods, the threat introduced by audio DeepFake
(DF) is ever-increasing. Several different detection methods have been presented as a …
(DF) is ever-increasing. Several different detection methods have been presented as a …
Amphion: An open-source audio, music and speech generation toolkit
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support
reproducible research and help junior researchers and engineers get started in the field of …
reproducible research and help junior researchers and engineers get started in the field of …
A comprehensive study on self-supervised distillation for speaker representation learning
In real application scenarios, it is often challenging to obtain a large amount of labeled data
for speaker representation learning due to speaker privacy concerns. Self-supervised …
for speaker representation learning due to speaker privacy concerns. Self-supervised …
Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
In multi-lingual societies, where multiple languages are spoken in a small geographic
vicinity, informal conversations often involve mix of languages. Existing speech technologies …
vicinity, informal conversations often involve mix of languages. Existing speech technologies …