Stylemelgan: An efficient high-fidelity adversarial vocoder with temporal adaptive normalization

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：208 相关文章所有 6 个版本

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W Ping, B Ginsburg, B Catanzaro… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

被引用次数：212 相关文章所有 5 个版本

[PDF] arxiv.org

Univnet: A neural vocoder with multi-resolution spectrogram discriminators for high-fidelity waveform generation

W Jang, D Lim, J Yoon, B Kim, J Kim - arXiv preprint arXiv:2106.07889, 2021 - arxiv.org

Most neural vocoders employ band-limited mel-spectrograms to generate waveforms. If full-
band spectral features are used as the input, the vocoder can be provided with as much …

被引用次数：135 相关文章所有 7 个版本

[HTML] mdpi.com

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com

The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

被引用次数：5 相关文章

[PDF] ieee.org

iSTFTNet: Fast and lightweight mel-spectrogram vocoder incorporating inverse short-time Fourier transform

T Kaneko, K Tanaka, H Kameoka… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is
commonly applied as an intermediate representation, and the necessity for a mel …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

CFAD: A Chinese dataset for fake audio detection

H Ma, J Yi, C Wang, X Yan, J Tao, T Wang… - Speech …, 2024 - Elsevier

Fake audio detection is a growing concern and some relevant datasets have been designed
for research. However, there is no standard public Chinese dataset under complex …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Espnet2-tts: Extending the edge of tts research

T Hayashi, R Yamamoto, T Yoshimura, P Wu… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …

被引用次数：65 相关文章所有 2 个版本

[PDF] aaai.org

Avocodo: Generative adversarial network for artifact-free vocoder

T Bak, J Lee, H Bae, J Yang, JS Bae… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Neural vocoders based on the generative adversarial neural network (GAN) have been
widely used due to their fast inference speed and lightweight networks while generating …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Safeear: Content privacy-preserving audio deepfake detection

X Li, K Li, Y Zheng, C Yan, X Ji, W Xu - Proceedings of the 2024 on ACM …, 2024 - dl.acm.org

Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable
performance in generating realistic and natural audio. However, their dark side, audio …

被引用次数：4 相关文章所有 5 个版本

[PDF] uni-augsburg.de

[PDF][PDF] SVTS: scalable video-to-speech synthesis

R Mira, A Haliassos, S Petridis… - arXiv preprint …, 2022 - opus.bibliothek.uni-augsburg.de

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip
movements into the corresponding audio. This task has received an increasing amount of …

被引用次数：31 相关文章所有 9 个版本