A sawtooth waveform inspired pitch estimator for speech and music

JW Kim, J Salamon, P Li… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

The task of estimating the fundamental frequency of a monophonic sound recording, also
known as pitch tracking, is fundamental to audio processing with multiple applications in …

被引用次数：433 相关文章所有 13 个版本

[PDF] jst.go.jp

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1475 相关文章所有 11 个版本

[PDF] arxiv.org

Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior

S Lee, H Kim, C Shin, X Tan, C Liu, Q Meng… - arXiv preprint arXiv …, 2021 - arxiv.org

Denoising diffusion probabilistic models have been recently proposed to generate high-
quality samples by estimating the gradient of the data density. The framework defines the …

被引用次数：100 相关文章所有 4 个版本

[PDF] mcgill.ca

The timbre toolbox: Extracting audio descriptors from musical signals

G Peeters, BL Giordano, P Susini, N Misdariis… - The Journal of the …, 2011 - pubs.aip.org

The analysis of musical signals to extract audio descriptors that can potentially characterize
their timbre has been disparate and often too focused on a particular small set of sounds …

被引用次数：564 相关文章所有 23 个版本

[PDF] danielpovey.com

A pitch extraction algorithm tuned for automatic speech recognition

P Ghahremani, B BabaAli, D Povey… - … on acoustics, speech …, 2014 - ieeexplore.ieee.org

In this paper we present an algorithm that produces pitch and probability-of-voicing
estimates for use as features in automatic speech recognition systems. These features give …

被引用次数：393 相关文章所有 12 个版本

[PDF] columbia.edu

Signal processing for music analysis

M Muller, DPW Ellis, A Klapuri… - IEEE Journal of selected …, 2011 - ieeexplore.ieee.org

Music signal processing may appear to be the junior relation of the large and mature field of
speech signal processing, not least because many techniques and representations …

被引用次数：369 相关文章所有 18 个版本

[HTML] nih.gov

Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries

A Flinker, WK Doyle, AD Mehta, O Devinsky… - Nature human …, 2019 - nature.com

The principles underlying functional asymmetries in cortex remain debated. For example, it
is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere …

被引用次数：127 相关文章所有 13 个版本

[PDF] ieee.org

SPICE: Self-supervised pitch estimation

B Gfeller, C Frank, D Roblek, M Sharifi… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

We propose a model to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …

被引用次数：92 相关文章所有 8 个版本

[PDF] arxiv.org

Towards the objective speech assessment of smoking status based on voice features: a review of the literature

Z Ma, C Bullen, JTW Chu, R Wang, Y Wang, S Singh - Journal of Voice, 2023 - Elsevier

ABSTRACT Background and Objective In smoking cessation clinical research and practice,
objective validation of self-reported smoking status is crucial for ensuring the reliability of the …

被引用次数：21 相关文章所有 8 个版本

[PDF] ieee.org

Extraction and utilization of excitation information of speech: A review

SR Kadiri, P Alku, B Yegnanarayana - Proceedings of the IEEE, 2021 - ieeexplore.ieee.org

Speech production can be regarded as a process where a time-varying vocal tract system
(filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech …

被引用次数：13 相关文章所有 7 个版本