相关文章- 学术资源搜索

Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy

M Talaat, K Barari, XA Si, J Xi - Visual Computing for Industry, Biomedicine …, 2024 - Springer

Speech is a highly coordinated process that requires precise control over vocal tract
morphology/motion to produce intelligible sounds while simultaneously generating unique …

被引用次数：1 相关文章所有 7 个版本

A novel framework using 3D-CNN and BiLSTM model with dynamic learning rate scheduler for visual speech recognition

V Chandrabanshi, S Domnic - Signal, Image and Video Processing, 2024 - Springer

Abstract Visual Speech Recognition (VSR) is an appealing technology for predicting and
analyzing spoken language based on lip movements. Previous research in this area has …

[PDF] mdpi.com

Lipreading architecture based on multiple convolutional neural networks for sentence-level visual speech recognition

S Jeon, A Elsharkawy, MS Kim - Sensors, 2021 - mdpi.com

In visual speech recognition (VSR), speech is transcribed using only visual information to
interpret tongue and teeth movements. Recently, deep learning has shown outstanding …

被引用次数：30 相关文章所有 7 个版本

[PDF] ucl.ac.uk

Artificial vocal learning guided by phoneme recognition and visual information

PK Krug, P Birkholz, B Gerazov… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

This paper introduces a paradigm shift regarding vocal learning simulations, in which the
communicative function of speech acquisition determines the learning process and …

被引用次数：3 相关文章所有 10 个版本

[PDF] plos.org

High-resolution, non-invasive imaging of upper vocal tract articulators compatible with human brain recordings

KE Bouchard, DF Conant, GK Anumanchipalli… - PLoS …, 2016 - journals.plos.org

A complete neurobiological understanding of speech motor control requires determination of
the relationship between simultaneously recorded neural activity and the kinematics of the …

被引用次数：33 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings

VS Nallanthighal, Z Mostaani, A Härmä, H Strik… - Neural Networks, 2021 - Elsevier

Respiration is an essential and primary mechanism for speech production. We first inhale
and then produce speech while exhaling. When we run out of breath, we stop speaking and …

被引用次数：33 相关文章所有 8 个版本

[PDF] hal.science

Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice

Y Teytaut, A Roebel - Proceedings of Interspeech 2021, 2021 - hal.science

Phoneme-to-audio alignment is the task of synchronizing voice recordings and their related
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …

被引用次数：23 相关文章所有 9 个版本

[PDF] arxiv.org

How to teach DNNs to pay attention to the visual modality in speech recognition

G Sterpu, C Saam, N Harte - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org

Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby exploit, the dynamic
relationship between a human voice and the corresponding mouth movements. A recently …

被引用次数：32 相关文章所有 5 个版本

[HTML] nih.gov

A deep learning approach for quantifying vocal fold dynamics during connected speech using laryngeal high-speed videoendoscopy

AM Yousef, DD Deliyski, SRC Zacharias… - Journal of Speech …, 2022 - ASHA

Purpose: Voice disorders are best assessed by examining vocal fold dynamics in connected
speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV) …

被引用次数：17 相关文章所有 7 个版本

[PDF] google.com

Advances in vocal tract imaging and analysis

A Toutios, D Byrd, L Goldstein… - … Routledge handbook of …, 2019 - taylorfrancis.com

A long-standing challenge in speech research is obtaining accurate information about the
movement and shaping of the vocal tract. Dynamic vocal tract imaging data, recorded in real …

被引用次数：13 相关文章所有 4 个版本