Crepe: A convolutional representation for pitch estimation

JW Kim, J Salamon, P Li… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The task of estimating the fundamental frequency of a monophonic sound recording, also
known as pitch tracking, is fundamental to audio processing with multiple applications in …

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior

S Lee, H Kim, C Shin, X Tan, C Liu, Q Meng… - arXiv preprint arXiv …, 2021 - arxiv.org
Denoising diffusion probabilistic models have been recently proposed to generate high-
quality samples by estimating the gradient of the data density. The framework defines the …

The timbre toolbox: Extracting audio descriptors from musical signals

G Peeters, BL Giordano, P Susini, N Misdariis… - The Journal of the …, 2011 - pubs.aip.org
The analysis of musical signals to extract audio descriptors that can potentially characterize
their timbre has been disparate and often too focused on a particular small set of sounds …

A pitch extraction algorithm tuned for automatic speech recognition

P Ghahremani, B BabaAli, D Povey… - … on acoustics, speech …, 2014 - ieeexplore.ieee.org
In this paper we present an algorithm that produces pitch and probability-of-voicing
estimates for use as features in automatic speech recognition systems. These features give …

Signal processing for music analysis

M Muller, DPW Ellis, A Klapuri… - IEEE Journal of selected …, 2011 - ieeexplore.ieee.org
Music signal processing may appear to be the junior relation of the large and mature field of
speech signal processing, not least because many techniques and representations …

Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries

A Flinker, WK Doyle, AD Mehta, O Devinsky… - Nature human …, 2019 - nature.com
The principles underlying functional asymmetries in cortex remain debated. For example, it
is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere …

SPICE: Self-supervised pitch estimation

B Gfeller, C Frank, D Roblek, M Sharifi… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org
We propose a model to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …

Towards the objective speech assessment of smoking status based on voice features: a review of the literature

Z Ma, C Bullen, JTW Chu, R Wang, Y Wang, S Singh - Journal of Voice, 2023 - Elsevier
ABSTRACT Background and Objective In smoking cessation clinical research and practice,
objective validation of self-reported smoking status is crucial for ensuring the reliability of the …

Extraction and utilization of excitation information of speech: A review

SR Kadiri, P Alku, B Yegnanarayana - Proceedings of the IEEE, 2021 - ieeexplore.ieee.org
Speech production can be regarded as a process where a time-varying vocal tract system
(filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech …