The IBM Attila speech recognition toolkit
We describe the design of IBM's Attila speech recognition toolkit. We show how the
combination of a highly modular and efficient library of low-level C++ classes with simple …
combination of a highly modular and efficient library of low-level C++ classes with simple …
Simultaneous translation of lectures and speeches
With increasing globalization, communication across language and cultural boundaries is
becoming an essential requirement of doing business, delivering education, and providing …
becoming an essential requirement of doing business, delivering education, and providing …
Identifying keyword occurrences in audio data
VN Gupta, G Boulianne - US Patent 8,423,363, 2013 - Google Patents
Occurrences of one or more keywords in audio data are identified using a speech
recognizer employing a language model to derive a transcript of the keywords. The …
recognizer employing a language model to derive a transcript of the keywords. The …
Decoding-time prediction of non-verbalized tokens
J Fritsch, A Deoras, D Koll - US Patent 8,918,317, 2014 - Google Patents
Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a
transcription of speech in which the tokens were not explicitly verbalized. Token prediction …
transcription of speech in which the tokens were not explicitly verbalized. Token prediction …
[PDF][PDF] Uncertainty decoding for noise robust speech recognition
It is well known that the performance of automatic speech recognition degrades in noisy
conditions. To address this, typically the noise is removed from the features or the models …
conditions. To address this, typically the noise is removed from the features or the models …
[PDF][PDF] Bag-of-word normalized n-gram models.
A Sethy, B Ramabhadran - INTERSPEECH, 2008 - isca-archive.org
Abstract The Bag-Of-Word (BOW) model uses a fixed length vector of word counts to
represent text. Although the model disregards word sequence information, it has been …
represent text. Although the model disregards word sequence information, it has been …
An iterative relative entropy minimization-based data selection approach for n-gram model adaptation
Performance of statistical n-gram language models depends heavily on the amount of
training text material and the degree to which the training text matches the domain of …
training text material and the degree to which the training text matches the domain of …
End-to-end speech endpoint detection utilizing acoustic and language modeling knowledge for online low-latency speech recognition
Speech endpoint detection (EPD) benefits from the decoder state features (DSFs) of online
automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR …
automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR …
Transcription system using automatic speech recognition for the Japanese Parliament (Diet)
T Kawahara - Proceedings of the AAAI Conference on Artificial …, 2012 - ojs.aaai.org
This article describes a new automatic transcription system in the Japanese Parliament
which deploys our automatic speech recognition (ASR) technology. To achieve high …
which deploys our automatic speech recognition (ASR) technology. To achieve high …
Statistical transformation of language and pronunciation models for spontaneous speech recognition
Y Akita, T Kawahara - IEEE Transactions on Audio, Speech …, 2009 - ieeexplore.ieee.org
We propose a novel approach based on a statistical transformation framework for language
and pronunciation modeling of spontaneous speech. Since it is not practical to train a …
and pronunciation modeling of spontaneous speech. Since it is not practical to train a …