An exemplar-based approach to frequency warping for voice conversion

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：371 相关文章所有 8 个版本

[PDF] cell.com Full View

Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com

Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

被引用次数：20 相关文章所有 7 个版本

[PDF] arxiv.org

Towards end-to-end synthetic speech detection

G Hua, ABJ Teoh, H Zhang - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org

The constant Q transform (CQT) has been shown to be one of the most effective speech
signal pre-transforms to facilitate synthetic speech detection, followed by either hand-crafted …

被引用次数：134 相关文章所有 4 个版本

[PDF] ieee.org

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

被引用次数：59 相关文章所有 5 个版本

Significance of subband features for synthetic speech detection

J Yang, RK Das, H Li - IEEE Transactions on Information …, 2019 - ieeexplore.ieee.org

In text-to-speech or voice conversion based synthetic speech detection, it is a common
practice that spectral information over the entire frequency band is used for feature …

被引用次数：83 相关文章

[PDF] researchgate.net

Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling

Y Zhou, X Tian, H Xu, RK Das… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …

被引用次数：91 相关文章所有 5 个版本

[PDF] sjtu.edu.cn

Modified magnitude-phase spectrum information for spoofing detection

J Yang, H Wang, RK Das, Y Qian - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

Most of the existing feature representations for spoofing countermeasures consider
information either from the magnitude or phase spectrum. We hypothesize that both …

被引用次数：40 相关文章所有 2 个版本

[PDF] sigport.org

ASSD: Synthetic Speech Detection in the AAC Compressed Domain

AKS Yadav, Z Xiang, ER Bartusiak… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Synthetic human speech signals have become very easy to generate given modern text-to-
speech methods. When these signals are shared on social media they are often …

被引用次数：11 相关文章所有 2 个版本

[PDF] ieee.org

Unsupervised representation disentanglement using cross domain features and adversarial learning in variational autoencoder based voice conversion

WC Huang, H Luo, HT Hwang, CC Lo… - … on Emerging Topics …, 2020 - ieeexplore.ieee.org

An effective approach for voice conversion (VC) is to disentangle linguistic content from
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …

被引用次数：50 相关文章所有 7 个版本

Extraction of octave spectra information for spoofing attack detection

J Yang, RK Das, N Zhou - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org

This article focuses on extracting information from the octave power spectra of long-term
constant-Q transform (CQT) for spoofing attack detection. A novel framework based on multi …

被引用次数：43 相关文章所有 2 个版本