Fast and reliable f0 estimation method based on the period extraction of vocal fold vibration...

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1447 相关文章所有 11 个版本

[PDF] arxiv.org

Diffsvc: A diffusion probabilistic model for singing voice conversion

S Liu, Y Cao, D Su, H Meng - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Singing voice conversion (SVC) is one promising technique that can enrich the way of
human-computer interaction by en-dowing a computer the ability to produce high-fidelity and …

被引用次数：46 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017 - isca-archive.org

A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

被引用次数：109 相关文章所有 4 个版本

[PDF] arxiv.org

Emotionless: Privacy-preserving speech analysis for voice assistants

R Aloufi, H Haddadi, D Boyle - arXiv preprint arXiv:1908.03632, 2019 - arxiv.org

Voice-enabled interactions provide more human-like experiences in many popular IoT
systems. Cloud-based speech analysis services extract useful information from voice input …

被引用次数：51 相关文章所有 2 个版本

[PDF] arxiv.org

Hierarchical prosody modeling for non-autoregressive speech synthesis

CM Chien, H Lee - 2021 IEEE Spoken Language Technology …, 2021 - ieeexplore.ieee.org

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.
By explicitly providing prosody features to the TTS model, the style of synthesized utterances …

被引用次数：35 相关文章所有 4 个版本

[PDF] neurips.cc

Voiceblock: Privacy through real-time adversarial attacks with audio-to-audio models

P O'Reilly, A Bugler, K Bhandari… - Advances in Neural …, 2022 - proceedings.neurips.cc

As governments and corporations adopt deep learning systems to collect and analyze user-
generated audio data, concerns about security and privacy naturally emerge in areas such …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-domain neural pitch and periodicity estimation

M Morrison, C Hsieh, N Pruyne, B Pardo - arXiv preprint arXiv:2301.12258, 2023 - arxiv.org

Pitch is a foundational aspect of our perception of audio signals. Pitch contours are
commonly used to analyze speech and music signals and as input features for many audio …

被引用次数：13 相关文章所有 2 个版本

Estimation and Voicing Detection With Cascade Architecture in Noisy Speech

Y Zhang, H Wang, DL Wang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

As a fundamental problem in speech processing, pitch tracking has been studied for
decades. While strong performance has been achieved on clean speech, pitch tracking in …

被引用次数：2 相关文章所有 2 个版本

[PDF] ieee.org

Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson's disease and parkinsonism

J Hlavnička, R Čmejla, J Klempíř, E Růžička… - IEEE Access, 2019 - ieeexplore.ieee.org

The prominent and early presence of dysphonia is considered a valuable marker for
differentiation of idiopathic Parkinson's disease and parkinsonian syndromes. Objective …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Deep learning based source separation applied to choir ensembles

D Petermann, P Chandna, H Cuesta, J Bonada… - arXiv preprint arXiv …, 2020 - arxiv.org

Choral singing is a widely practiced form of ensemble singing wherein a group of people
sing simultaneously in polyphonic harmony. The most commonly practiced setting for choir …

被引用次数：29 相关文章所有 5 个版本