Restructuring speech representations using a pitch-adaptive time–frequency smoothing and...

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：328 相关文章所有 11 个版本

[PDF] ieee.org

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：375 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：409 相关文章所有 2 个版本

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：265 相关文章所有 6 个版本

[PDF] sciencedirect.com

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier

Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

被引用次数：373 相关文章所有 15 个版本

[PDF] academia.edu

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

被引用次数：5641 相关文章所有 10 个版本

[PDF] arxiv.org

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

被引用次数：63 相关文章所有 4 个版本

[PDF] arxiv.org

The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods

J Lorenzo-Trueba, J Yamagishi, T Toda, D Saito… - arXiv preprint arXiv …, 2018 - arxiv.org

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016
edition with the aim of providing a common framework for evaluating and comparing …

被引用次数：375 相关文章所有 13 个版本

[PDF] arxiv.org

Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks

CC Hsu, HT Hwang, YC Wu, Y Tsao… - arXiv preprint arXiv …, 2017 - arxiv.org

Building a voice conversion (VC) system from non-parallel speech corpora is challenging
but highly valuable in real application scenarios. In most situations, the source and the target …

被引用次数：457 相关文章所有 11 个版本

[PDF] ed.ac.uk

ASVspoof: the automatic speaker verification spoofing and countermeasures challenge

Z Wu, J Yamagishi, T Kinnunen… - IEEE Journal of …, 2017 - ieeexplore.ieee.org

Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …

被引用次数：795 相关文章所有 30 个版本