Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and...

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier

Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

被引用次数：314 相关文章所有 6 个版本

[PDF] cell.com Full View

[PDF][PDF] A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy

AJE Kell, DLK Yamins, EN Shook… - Neuron, 2018 - cell.com

A core goal of auditory neuroscience is to build quantitative models that predict cortical
responses to natural sounds. Reasoning that a complete model of auditory cortex must solve …

被引用次数：540 相关文章所有 11 个版本

[PDF] mlr.press

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press

Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …

被引用次数：177 相关文章所有 10 个版本

[PDF] academia.edu

Speaker perception

SR Schweinberger, H Kawahara… - Wiley …, 2014 - Wiley Online Library

While humans use their voice mainly for communicating information about the world,
paralinguistic cues in the voice signal convey rich dynamic information about a speaker's …

被引用次数：115 相关文章所有 7 个版本

[PDF] jst.go.jp

WORLD: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1417 相关文章所有 13 个版本

[PDF] isca-archive.org

[PDF][PDF] Speaker-dependent wavenet vocoder.

A Tamamori, T Hayashi, K Kobayashi, K Takeda… - Interspeech, 2017 - isca-archive.org

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …

被引用次数：334 相关文章所有 6 个版本

Indifference to dissonance in native Amazonians reveals cultural variation in music perception

JH McDermott, AF Schultz, EA Undurraga, RA Godoy - Nature, 2016 - nature.com

Music is present in every culture, but the degree to which it is shaped by biology remains
debated. One widely discussed phenomenon is that some combinations of notes are …

被引用次数：373 相关文章所有 15 个版本

[HTML] sciencedirect.com

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier

An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

被引用次数：220 相关文章所有 8 个版本

[HTML] nature.com Full View

[HTML][HTML] Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals

MJ McPherson, SE Dolan, A Durango… - Nature …, 2020 - nature.com

Music perception is plausibly constrained by universal perceptual mechanisms adapted to
natural sounds. Such constraints could arise from our dependence on harmonic frequency …

被引用次数：75 相关文章所有 23 个版本

[PDF] isca-archive.org

[PDF][PDF] Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.

T Hayashi, S Watanabe, T Toda, K Takeda… - …, 2019 - isca-archive.org

We propose an end-to-end text-to-speech (TTS) synthesis model that explicitly uses
information from pre-trained embeddings of the text. Recent work in natural language …

被引用次数：88 相关文章所有 7 个版本