Crepe: A convolutional representation for pitch estimation
The task of estimating the fundamental frequency of a monophonic sound recording, also
known as pitch tracking, is fundamental to audio processing with multiple applications in …
known as pitch tracking, is fundamental to audio processing with multiple applications in …
World: a vocoder-based high-quality speech synthesis system for real-time applications
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …
improve the sound quality of real-time applications using speech. Speech analysis …
Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior
Denoising diffusion probabilistic models have been recently proposed to generate high-
quality samples by estimating the gradient of the data density. The framework defines the …
quality samples by estimating the gradient of the data density. The framework defines the …
The timbre toolbox: Extracting audio descriptors from musical signals
The analysis of musical signals to extract audio descriptors that can potentially characterize
their timbre has been disparate and often too focused on a particular small set of sounds …
their timbre has been disparate and often too focused on a particular small set of sounds …
A pitch extraction algorithm tuned for automatic speech recognition
P Ghahremani, B BabaAli, D Povey… - … on acoustics, speech …, 2014 - ieeexplore.ieee.org
In this paper we present an algorithm that produces pitch and probability-of-voicing
estimates for use as features in automatic speech recognition systems. These features give …
estimates for use as features in automatic speech recognition systems. These features give …
Signal processing for music analysis
Music signal processing may appear to be the junior relation of the large and mature field of
speech signal processing, not least because many techniques and representations …
speech signal processing, not least because many techniques and representations …
Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries
The principles underlying functional asymmetries in cortex remain debated. For example, it
is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere …
is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere …
SPICE: Self-supervised pitch estimation
We propose a model to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …
Towards the objective speech assessment of smoking status based on voice features: a review of the literature
ABSTRACT Background and Objective In smoking cessation clinical research and practice,
objective validation of self-reported smoking status is crucial for ensuring the reliability of the …
objective validation of self-reported smoking status is crucial for ensuring the reliability of the …
Extraction and utilization of excitation information of speech: A review
Speech production can be regarded as a process where a time-varying vocal tract system
(filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech …
(filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech …