Instantaneous pitch estimation based on RAPT framework

M Vashkevich, Y Rushkevich - Biomedical Signal Processing and Control, 2021 - Elsevier

Amyotrophic lateral sclerosis (ALS) is incurable neurological disorder with rapidly
progressive course. Common early symptoms of ALS are difficulty in swallowing and …

被引用次数：51 相关文章所有 5 个版本

MSMC-TTS: Multi-stage multi-codebook VQ-VAE based neural TTS

H Guo, F Xie, X Wu, FK Soong… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

This article aims to improve neural TTS with vector-quantized, compact speech
representations. We propose a Vector-Quantized Variational AutoEncoder (VQ-VAE) based …

被引用次数：14 相关文章所有 2 个版本

Fusion of spectral and prosody modelling for multilingual speech emotion conversion

S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier

The paper proposes an integrated speech emotion conversion framework developed using
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Bulbar ALS detection based on analysis of voice perturbation and vibrato

M Vashkevich, A Petrovsky… - 2019 Signal Processing …, 2019 - ieeexplore.ieee.org

On average the lack of biological markers causes a one year diagnostic delay to detect
amyotrophic lateral sclerosis (ALS). To improve the diagnostic process an automatic voice …

被引用次数：49 相关文章所有 10 个版本

[PDF] ieee.org

Contributions of Jitter and Shimmer in the Voice for Fake Audio Detection

K Li, X Lu, M Akagi, M Unoki - IEEE Access, 2023 - ieeexplore.ieee.org

Fake audio detection (FAD) aims to identify fraudulent speech generated through advanced
speech-synthesis techniques. Most current FAD methods rely solely on a deep neural …

被引用次数：5 相关文章所有 3 个版本

[PDF] ieee.org

Emotional voice conversion using a hybrid framework with speaker-adaptive DNN and particle-swarm-optimized neural network

S Vekkot, D Gupta, M Zakariah, YA Alotaibi - IEEE Access, 2020 - ieeexplore.ieee.org

We propose a hybrid network-based learning framework for speaker-adaptive vocal emotion
conversion, tested on three different datasets (languages), namely, EmoDB (German) …

被引用次数：20 相关文章所有 5 个版本

CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech

JM Alvarez, H Francois, H Sung, S Choi, J Jeong… - Applied Acoustics, 2022 - Elsevier

Spoken language is becoming one of the key components of human–machine interaction,
both to send information to the machine–eg voice control–and to receive from it–eg virtual …

被引用次数：6 相关文章

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

ES Hassan, B Neyazi, HS Seddeq… - Multimedia Tools and …, 2024 - Springer

In the realm of speaker identification, pitch frequency serves as a fundamental feature.
However, this feature can be compromised when a speaker records his speech in a closed …

[PDF] researchgate.net

A novel pitch detection algorithm based on instantaneous frequency for clean and noisy speech

Z Mnasri, S Rovetta, F Masulli - Circuits, Systems, and Signal Processing, 2022 - Springer

In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is
a classical problem that has been investigated since the very beginning of speech …

被引用次数：3 相关文章所有 7 个版本

[PDF] isca-archive.org

[PDF][PDF] Real-time voice conversion using artificial neural networks with rectified linear units.

E Azarov, M Vashkevich, D Likhachov… - …, 2013 - isca-archive.org

This paper presents an approach to parametric voice conversion that can be used in real-
time entertainment applications. The approach is based on spectral mapping using an …

被引用次数：20 相关文章所有 6 个版本