Progressive disentangled representation learning for fine-grained controllable talking head synthesis

D Wang, Y Deng, Z Yin, HY Shum… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …

Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review

K Radha, M Bansal, RB Pachori - Engineering Applications of Artificial …, 2024 - Elsevier
Conventionally, the extraction of hand-crafted acoustic features has been separated from the
task of establishing robust machine-learning models in speech processing. The manual …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification

B Han, Z Chen, Y Qian - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …

Pushing the limits of self-supervised speaker verification using regularized distillation framework

Y Chen, S Zheng, H Wang, L Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Training robust speaker verification systems without speaker labels has long been a
challenging task. Previous studies observed a large performance gap between self …

Improved deepfake detection using whisper features

P Kawa, M Plata, M Czuba, P Szymański… - arXiv preprint arXiv …, 2023 - arxiv.org
With a recent influx of voice generation methods, the threat introduced by audio DeepFake
(DF) is ever-increasing. Several different detection methods have been presented as a …

Amphion: An open-source audio, music and speech generation toolkit

X Zhang, L Xue, Y Wang, Y Gu, X Chen, Z Fang… - arXiv preprint arXiv …, 2023 - arxiv.org
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support
reproducible research and help junior researchers and engineers get started in the field of …

A comprehensive study on self-supervised distillation for speaker representation learning

Z Chen, Y Qian, B Han, Y Qian… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
In real application scenarios, it is often challenging to obtain a large amount of labeled data
for speaker representation learning due to speaker privacy concerns. Self-supervised …

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments

S Baghel, S Ramoji, S Jain, PR Chowdhuri… - Speech …, 2024 - Elsevier
In multi-lingual societies, where multiple languages are spoken in a small geographic
vicinity, informal conversations often involve mix of languages. Existing speech technologies …