The IBM Attila speech recognition toolkit

H Soltau, G Saon, B Kingsbury - 2010 IEEE Spoken Language …, 2010 - ieeexplore.ieee.org
We describe the design of IBM's Attila speech recognition toolkit. We show how the
combination of a highly modular and efficient library of low-level C++ classes with simple …

Simultaneous translation of lectures and speeches

C Fügen, A Waibel, M Kolss - Machine translation, 2007 - Springer
With increasing globalization, communication across language and cultural boundaries is
becoming an essential requirement of doing business, delivering education, and providing …

Identifying keyword occurrences in audio data

VN Gupta, G Boulianne - US Patent 8,423,363, 2013 - Google Patents
Occurrences of one or more keywords in audio data are identified using a speech
recognizer employing a language model to derive a transcript of the keywords. The …

Decoding-time prediction of non-verbalized tokens

J Fritsch, A Deoras, D Koll - US Patent 8,918,317, 2014 - Google Patents
Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a
transcription of speech in which the tokens were not explicitly verbalized. Token prediction …

[PDF][PDF] Uncertainty decoding for noise robust speech recognition

H Liao, MJF Gales - 2009 - Citeseer
It is well known that the performance of automatic speech recognition degrades in noisy
conditions. To address this, typically the noise is removed from the features or the models …

[PDF][PDF] Bag-of-word normalized n-gram models.

A Sethy, B Ramabhadran - INTERSPEECH, 2008 - isca-archive.org
Abstract The Bag-Of-Word (BOW) model uses a fixed length vector of word counts to
represent text. Although the model disregards word sequence information, it has been …

An iterative relative entropy minimization-based data selection approach for n-gram model adaptation

A Sethy, PG Georgiou, B Ramabhadran… - IEEE transactions on …, 2009 - ieeexplore.ieee.org
Performance of statistical n-gram language models depends heavily on the amount of
training text material and the degree to which the training text matches the domain of …

End-to-end speech endpoint detection utilizing acoustic and language modeling knowledge for online low-latency speech recognition

I Hwang, JH Chang - IEEE access, 2020 - ieeexplore.ieee.org
Speech endpoint detection (EPD) benefits from the decoder state features (DSFs) of online
automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR …

Transcription system using automatic speech recognition for the Japanese Parliament (Diet)

T Kawahara - Proceedings of the AAAI Conference on Artificial …, 2012 - ojs.aaai.org
This article describes a new automatic transcription system in the Japanese Parliament
which deploys our automatic speech recognition (ASR) technology. To achieve high …

Statistical transformation of language and pronunciation models for spontaneous speech recognition

Y Akita, T Kawahara - IEEE Transactions on Audio, Speech …, 2009 - ieeexplore.ieee.org
We propose a novel approach based on a statistical transformation framework for language
and pronunciation modeling of spontaneous speech. Since it is not practical to train a …