Pushing the limits of raw waveform speaker recognition

Progressive disentangled representation learning for fine-grained controllable talking head synthesis

D Wang, Y Deng, Z Yin, HY Shum… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …

被引用次数：47 相关文章所有 5 个版本

Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review

K Radha, M Bansal, RB Pachori - Engineering Applications of Artificial …, 2024 - Elsevier

Conventionally, the extraction of hand-crafted acoustic features has been separated from the
task of establishing robust machine-learning models in speech processing. The manual …

被引用次数：5 相关文章

[PDF] arxiv.org

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

被引用次数：43 相关文章所有 4 个版本

[PDF] hal.science

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science

pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

被引用次数：66 相关文章所有 18 个版本

[PDF] arxiv.org

Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification

B Han, Z Chen, Y Qian - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …

被引用次数：21 相关文章所有 4 个版本

[PDF] arxiv.org

Pushing the limits of self-supervised speaker verification using regularized distillation framework

Y Chen, S Zheng, H Wang, L Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Training robust speaker verification systems without speaker labels has long been a
challenging task. Previous studies observed a large performance gap between self …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Improved deepfake detection using whisper features

P Kawa, M Plata, M Czuba, P Szymański… - arXiv preprint arXiv …, 2023 - arxiv.org

With a recent influx of voice generation methods, the threat introduced by audio DeepFake
(DF) is ever-increasing. Several different detection methods have been presented as a …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Amphion: An open-source audio, music and speech generation toolkit

X Zhang, L Xue, Y Wang, Y Gu, X Chen, Z Fang… - arXiv preprint arXiv …, 2023 - arxiv.org

Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support
reproducible research and help junior researchers and engineers get started in the field of …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive study on self-supervised distillation for speaker representation learning

Z Chen, Y Qian, B Han, Y Qian… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

In real application scenarios, it is often challenging to obtain a large amount of labeled data
for speaker representation learning due to speaker privacy concerns. Self-supervised …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments

S Baghel, S Ramoji, S Jain, PR Chowdhuri… - Speech …, 2024 - Elsevier

In multi-lingual societies, where multiple languages are spoken in a small geographic
vicinity, informal conversations often involve mix of languages. Existing speech technologies …

被引用次数：6 相关文章所有 4 个版本