Interacting with computers by voice: automatic speech recognition and synthesis

D O'shaughnessy - Proceedings of the IEEE, 2003 - ieeexplore.ieee.org
This paper examines how people communicate with computers using speech. Automatic
speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or …

On short-time estimation of vocal tract length from formant frequencies

AC Lammert, SS Narayanan - PloS one, 2015 - journals.plos.org
Vocal tract length is highly variable across speakers and determines many aspects of the
acoustic speech signal, making it an essential parameter to consider for explaining …

Generative modeling of pseudo-whisper for robust whispered speech recognition

S Ghaffarzadegan, H Bořil… - IEEE/ACM Transactions …, 2016 - ieeexplore.ieee.org
Whisper is a common means of communication used to avoid disturbing individuals or to
exchange private information. As a vocal style, whisper would be an ideal candidate for …

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

S Panchapagesan, A Alwan - Computer speech & language, 2009 - Elsevier
Vocal tract length normalization (VTLN) for standard filterbank-based Mel frequency cepstral
coefficient (MFCC) features is usually implemented by warping the center frequencies of the …

Adaptation of children's speech with limited data based on formant-like peak alignment

X Cui, A Alwan - Computer speech & language, 2006 - Elsevier
Automatic recognition of children's speech using acoustic models trained by adults results in
poor performance due to differences in speech acoustics. These acoustical differences are a …

Flux-closure-domain states and demagnetizing energy determination in sub-micron size magnetic dots

PO Jubert, JC Toussaint, O Fruchart, C Meyer… - Europhysics …, 2003 - iopscience.iop.org
We used single-crystalline Fe dots self-assembled under UHV as a model system to discuss
micromagnetic properties of sub-micron size magnetic dots and show what properties may …

A low-complexity parabolic lip contour model with speaker normalization for high-level feature extraction in noise-robust audiovisual speech recognition

BJ Borgstrom, A Alwan - … Systems, Man, and Cybernetics-Part A …, 2008 - ieeexplore.ieee.org
This paper proposes a novel low-complexity lip contour model for high-level optic feature
extraction in noise-robust audiovisual (AV) automatic speech recognition systems. The …

[PDF][PDF] Normalization in the acoustic feature space for improved speech recognition

S Molau - 2003 - d-nb.info
In this work, normalization techniques in the acoustic feature space are studied which
improve the robustness of automatic speech recognition systems. It is shown that there is a …

Analyse et modèle génératif de l'expressivité: application à la parole et à l'interprétation musicale

G Beller - 2009 - theses.hal.science
1 Contexte actuel.......................... 2 1.2 Enjeu de la these.......................... 3 1.3 Proposition
centrale........................ 4 1.3. 1 Le paradoxe du comédien.................... 4 1.3. 2 …

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

MC Madhavi, HA Patil - Computer Speech & Language, 2019 - Elsevier
A speech spectrum is known to be changed by the variations in the length of the vocal tract
of a speaker. This is because of the fact that speech formants are inversely related to the …