Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …
easy-to-implement, simple but effective backbone for automatic speaker verification based …
Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
The vicomtech audio deepfake detection system based on wav2vec2 for the 2022 add challenge
JM Martín-Doñas, A Álvarez - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper describes our submitted systems to the 2022 ADD challenge withing the tracks 1
and 2. Our approach is based on the combination of a pre-trained wav2vec2 feature …
and 2. Our approach is based on the combination of a pre-trained wav2vec2 feature …
Freevc: Towards high-quality text-free one-shot voice conversion
J Li, W Tu, L Xiao - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) can be achieved by first extracting source content information and
target speaker information, and then reconstructing waveform with these information …
target speaker information, and then reconstructing waveform with these information …
Computational language modeling and the promise of in silico experimentation
Abstract Language neuroscience currently relies on two major experimental paradigms:
controlled experiments using carefully hand-designed stimuli, and natural stimulus …
controlled experiments using carefully hand-designed stimuli, and natural stimulus …
Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching
This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …
Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification
The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …
approaches with a large-scale, manually annotated dataset. However, collecting a …
Why does self-supervised learning for speech recognition benefit speaker recognition?
Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker
recognition, even if the pre-training objective is designed for speech recognition. In this …
recognition, even if the pre-training objective is designed for speech recognition. In this …
Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …