An overview of voice conversion systems
SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier
Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …
[PDF][PDF] A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy
A core goal of auditory neuroscience is to build quantitative models that predict cortical
responses to natural sounds. Reasoning that a complete model of auditory cortex must solve …
responses to natural sounds. Reasoning that a complete model of auditory cortex must solve …
Unsupervised speech decomposition via triple information bottleneck
Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …
Speaker perception
SR Schweinberger, H Kawahara… - Wiley …, 2014 - Wiley Online Library
While humans use their voice mainly for communicating information about the world,
paralinguistic cues in the voice signal convey rich dynamic information about a speaker's …
paralinguistic cues in the voice signal convey rich dynamic information about a speaker's …
WORLD: a vocoder-based high-quality speech synthesis system for real-time applications
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …
improve the sound quality of real-time applications using speech. Speech analysis …
[PDF][PDF] Speaker-dependent wavenet vocoder.
In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …
Indifference to dissonance in native Amazonians reveals cultural variation in music perception
JH McDermott, AF Schultz, EA Undurraga, RA Godoy - Nature, 2016 - nature.com
Music is present in every culture, but the degree to which it is shaped by biology remains
debated. One widely discussed phenomenon is that some combinations of notes are …
debated. One widely discussed phenomenon is that some combinations of notes are …
[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis
M Morise - Speech Communication, 2016 - Elsevier
An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …
[HTML][HTML] Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals
MJ McPherson, SE Dolan, A Durango… - Nature …, 2020 - nature.com
Music perception is plausibly constrained by universal perceptual mechanisms adapted to
natural sounds. Such constraints could arise from our dependence on harmonic frequency …
natural sounds. Such constraints could arise from our dependence on harmonic frequency …
[PDF][PDF] Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.
We propose an end-to-end text-to-speech (TTS) synthesis model that explicitly uses
information from pre-trained embeddings of the text. Recent work in natural language …
information from pre-trained embeddings of the text. Recent work in natural language …