Classification of ALS patients based on acoustic analysis of sustained vowel phonations

M Vashkevich, Y Rushkevich - Biomedical Signal Processing and Control, 2021 - Elsevier
Amyotrophic lateral sclerosis (ALS) is incurable neurological disorder with rapidly
progressive course. Common early symptoms of ALS are difficulty in swallowing and …

MSMC-TTS: Multi-stage multi-codebook VQ-VAE based neural TTS

H Guo, F Xie, X Wu, FK Soong… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
This article aims to improve neural TTS with vector-quantized, compact speech
representations. We propose a Vector-Quantized Variational AutoEncoder (VQ-VAE) based …

Fusion of spectral and prosody modelling for multilingual speech emotion conversion

S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier
The paper proposes an integrated speech emotion conversion framework developed using
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …

Bulbar ALS detection based on analysis of voice perturbation and vibrato

M Vashkevich, A Petrovsky… - 2019 Signal Processing …, 2019 - ieeexplore.ieee.org
On average the lack of biological markers causes a one year diagnostic delay to detect
amyotrophic lateral sclerosis (ALS). To improve the diagnostic process an automatic voice …

Contributions of Jitter and Shimmer in the Voice for Fake Audio Detection

K Li, X Lu, M Akagi, M Unoki - IEEE Access, 2023 - ieeexplore.ieee.org
Fake audio detection (FAD) aims to identify fraudulent speech generated through advanced
speech-synthesis techniques. Most current FAD methods rely solely on a deep neural …

Emotional voice conversion using a hybrid framework with speaker-adaptive DNN and particle-swarm-optimized neural network

S Vekkot, D Gupta, M Zakariah, YA Alotaibi - IEEE Access, 2020 - ieeexplore.ieee.org
We propose a hybrid network-based learning framework for speaker-adaptive vocal emotion
conversion, tested on three different datasets (languages), namely, EmoDB (German) …

CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech

JM Alvarez, H Francois, H Sung, S Choi, J Jeong… - Applied Acoustics, 2022 - Elsevier
Spoken language is becoming one of the key components of human–machine interaction,
both to send information to the machine–eg voice control–and to receive from it–eg virtual …

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

ES Hassan, B Neyazi, HS Seddeq… - Multimedia Tools and …, 2024 - Springer
In the realm of speaker identification, pitch frequency serves as a fundamental feature.
However, this feature can be compromised when a speaker records his speech in a closed …

A novel pitch detection algorithm based on instantaneous frequency for clean and noisy speech

Z Mnasri, S Rovetta, F Masulli - Circuits, Systems, and Signal Processing, 2022 - Springer
In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is
a classical problem that has been investigated since the very beginning of speech …

[PDF][PDF] Real-time voice conversion using artificial neural networks with rectified linear units.

E Azarov, M Vashkevich, D Likhachov… - …, 2013 - isca-archive.org
This paper presents an approach to parametric voice conversion that can be used in real-
time entertainment applications. The approach is based on spectral mapping using an …