Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy

M Talaat, K Barari, XA Si, J Xi - Visual Computing for Industry, Biomedicine …, 2024 - Springer
Speech is a highly coordinated process that requires precise control over vocal tract
morphology/motion to produce intelligible sounds while simultaneously generating unique …

A novel framework using 3D-CNN and BiLSTM model with dynamic learning rate scheduler for visual speech recognition

V Chandrabanshi, S Domnic - Signal, Image and Video Processing, 2024 - Springer
Abstract Visual Speech Recognition (VSR) is an appealing technology for predicting and
analyzing spoken language based on lip movements. Previous research in this area has …

Lipreading architecture based on multiple convolutional neural networks for sentence-level visual speech recognition

S Jeon, A Elsharkawy, MS Kim - Sensors, 2021 - mdpi.com
In visual speech recognition (VSR), speech is transcribed using only visual information to
interpret tongue and teeth movements. Recently, deep learning has shown outstanding …

Artificial vocal learning guided by phoneme recognition and visual information

PK Krug, P Birkholz, B Gerazov… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
This paper introduces a paradigm shift regarding vocal learning simulations, in which the
communicative function of speech acquisition determines the learning process and …

High-resolution, non-invasive imaging of upper vocal tract articulators compatible with human brain recordings

KE Bouchard, DF Conant, GK Anumanchipalli… - PLoS …, 2016 - journals.plos.org
A complete neurobiological understanding of speech motor control requires determination of
the relationship between simultaneously recorded neural activity and the kinematics of the …

[HTML][HTML] Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings

VS Nallanthighal, Z Mostaani, A Härmä, H Strik… - Neural Networks, 2021 - Elsevier
Respiration is an essential and primary mechanism for speech production. We first inhale
and then produce speech while exhaling. When we run out of breath, we stop speaking and …

Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice

Y Teytaut, A Roebel - Proceedings of Interspeech 2021, 2021 - hal.science
Phoneme-to-audio alignment is the task of synchronizing voice recordings and their related
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …

How to teach DNNs to pay attention to the visual modality in speech recognition

G Sterpu, C Saam, N Harte - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby exploit, the dynamic
relationship between a human voice and the corresponding mouth movements. A recently …

A deep learning approach for quantifying vocal fold dynamics during connected speech using laryngeal high-speed videoendoscopy

AM Yousef, DD Deliyski, SRC Zacharias… - Journal of Speech …, 2022 - ASHA
Purpose: Voice disorders are best assessed by examining vocal fold dynamics in connected
speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV) …

Advances in vocal tract imaging and analysis

A Toutios, D Byrd, L Goldstein… - … Routledge handbook of …, 2019 - taylorfrancis.com
A long-standing challenge in speech research is obtaining accurate information about the
movement and shaping of the vocal tract. Dynamic vocal tract imaging data, recorded in real …