Silent speech recognition as an alternative communication device for persons with laryngectomy

GS Meltzner, JT Heaton, Y Deng… - … ACM transactions on …, 2017 - ieeexplore.ieee.org
GS Meltzner, JT Heaton, Y Deng, G De Luca, SH Roy, JC Kline
IEEE/ACM transactions on audio, speech, and language processing, 2017ieeexplore.ieee.org
Each year thousands of individuals require surgical removal of the larynx (voice box) due to
trauma or disease, and thereby require an alternative voice source or assistive device to
verbally communicate. Although natural voice is lost after laryngectomy, most muscles
controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of
speech musculature can be recorded from the neck and face, and used for automatic
speech recognition to provide speech-to-text or synthesized speech as an alternative means …
Each year thousands of individuals require surgical removal of the larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, eight individuals at least 6 months after total laryngectomy were recorded using eight sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full eight-sensor set (averaging 9.5% for the top four participants), and 13.6% when reducing the sensor set to four locations per individual (n = 7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果