Phone recognition on the TIMIT database

C Lopes, F Perdigao - Speech Technologies/Book, 2011 - books.google.com
C Lopes, F Perdigao
Speech Technologies/Book, 2011books.google.com
In the information age, computer applications have become part of modern life and this has
in turn encouraged the expectations of friendly interaction with them. Speech, as “the”
communication mode, has seen the successful development of quite a number of
applications using automatic speech recognition (ASR), including command and control,
dictation, dialog systems for people with impairments, translation, etc. But the actual
challenge goes beyond the use of speech in control applications or to access information …
In the information age, computer applications have become part of modern life and this has in turn encouraged the expectations of friendly interaction with them. Speech, as “the” communication mode, has seen the successful development of quite a number of applications using automatic speech recognition (ASR), including command and control, dictation, dialog systems for people with impairments, translation, etc. But the actual challenge goes beyond the use of speech in control applications or to access information. The goal is to use speech as an information source, competing, for example, with text online. Since the technology supporting computer applications is highly dependent on the performance of the ASR system, research into ASR is still an active topic, as is shown by the range of research directions suggested in (Baker et al., 2009a, 2009b). Automatic speech recognition–the recognition of the information embedded in a speech signal and its transcription in terms of a set of characters,(Junqua & Haton, 1996)–has been object of intensive research for more than four decades, achieving notable results. It is only to be expected that speech recognition advances make spoken language as convenient and accessible as online text when the recognizers reach error rates near zero. But while digit recognition has already reached a rate of 99.6%,(Li, 2008), the same cannot be said of phone recognition, for which the best rates are still under 80% 1,(Mohamed et al., 2011; Siniscalchi et al., 2007).
Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations. Large Vocabulary ASR (LVASR) systems’ performance depends on the quality of the phone recognizer. That is why research teams continue developing phone recognizers, in order to enhance their performance as much as possible. Phone recognition is, in fact, a recurrent problem for the speech recognition community. Phone recognition can be found in a wide range of applications. In addition to typical LVASR systems like (Morris & Fosler-Lussier, 2008; Scanlon et al., 2007; Schwarz, 2008), it can be found in applications related to keyword detection,(Schwarz, 2008), language recognition,(Matejka, 2009; Schwarz, 2008), speaker identification,(Furui, 2005) and applications for music identification and translation,(Fujihara & Goto, 2008; Gruhne et al., 2007). The challenge of building robust acoustic models involves applying good training algorithms to a suitable set of data. The database defines the units that can be trained and
books.google.com
以上显示的是最相近的搜索结果。 查看全部搜索结果