Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy
Speech is a highly coordinated process that requires precise control over vocal tract
morphology/motion to produce intelligible sounds while simultaneously generating unique …
morphology/motion to produce intelligible sounds while simultaneously generating unique …
A novel framework using 3D-CNN and BiLSTM model with dynamic learning rate scheduler for visual speech recognition
V Chandrabanshi, S Domnic - Signal, Image and Video Processing, 2024 - Springer
Abstract Visual Speech Recognition (VSR) is an appealing technology for predicting and
analyzing spoken language based on lip movements. Previous research in this area has …
analyzing spoken language based on lip movements. Previous research in this area has …
Lipreading architecture based on multiple convolutional neural networks for sentence-level visual speech recognition
S Jeon, A Elsharkawy, MS Kim - Sensors, 2021 - mdpi.com
In visual speech recognition (VSR), speech is transcribed using only visual information to
interpret tongue and teeth movements. Recently, deep learning has shown outstanding …
interpret tongue and teeth movements. Recently, deep learning has shown outstanding …
Artificial vocal learning guided by phoneme recognition and visual information
PK Krug, P Birkholz, B Gerazov… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
This paper introduces a paradigm shift regarding vocal learning simulations, in which the
communicative function of speech acquisition determines the learning process and …
communicative function of speech acquisition determines the learning process and …
High-resolution, non-invasive imaging of upper vocal tract articulators compatible with human brain recordings
A complete neurobiological understanding of speech motor control requires determination of
the relationship between simultaneously recorded neural activity and the kinematics of the …
the relationship between simultaneously recorded neural activity and the kinematics of the …
[HTML][HTML] Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings
Respiration is an essential and primary mechanism for speech production. We first inhale
and then produce speech while exhaling. When we run out of breath, we stop speaking and …
and then produce speech while exhaling. When we run out of breath, we stop speaking and …
Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice
Phoneme-to-audio alignment is the task of synchronizing voice recordings and their related
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …
How to teach DNNs to pay attention to the visual modality in speech recognition
Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby exploit, the dynamic
relationship between a human voice and the corresponding mouth movements. A recently …
relationship between a human voice and the corresponding mouth movements. A recently …
A deep learning approach for quantifying vocal fold dynamics during connected speech using laryngeal high-speed videoendoscopy
AM Yousef, DD Deliyski, SRC Zacharias… - Journal of Speech …, 2022 - ASHA
Purpose: Voice disorders are best assessed by examining vocal fold dynamics in connected
speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV) …
speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV) …
Advances in vocal tract imaging and analysis
A long-standing challenge in speech research is obtaining accurate information about the
movement and shaping of the vocal tract. Dynamic vocal tract imaging data, recorded in real …
movement and shaping of the vocal tract. Dynamic vocal tract imaging data, recorded in real …