The Microsoft 2017 conversational speech recognition system
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …
Achieving human parity in conversational speech recognition
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …
Toward human parity in conversational speech recognition
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human …
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human …
[PDF][PDF] SRILM-an extensible language modeling toolkit.
A Stolcke - Interspeech, 2002 - huggingface.co
ABSTRACT SRILM is a collection of C++ libraries, executable programs, and helper scripts
designed to allow both production of and experimentation with statistical language models …
designed to allow both production of and experimentation with statistical language models …
Finding consensus in speech recognition: word error minimization and other applications of confusion networks
We describe a new framework for distilling information from word lattices to improve the
accuracy of the speech recognition output and obtain a more perspicuous representation of …
accuracy of the speech recognition output and obtain a more perspicuous representation of …
[PDF][PDF] Prosody-based automatic detection of annoyance and frustration in human-computer dialog.
We investigate the use of prosody for the detection of frustration and annoyance in natural
human-computer dialog. In addition to prosodic features, we examine the contribution of …
human-computer dialog. In addition to prosodic features, we examine the contribution of …
Dover-lap: A method for combining overlap-aware diarization outputs
Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …
speaker diarization. Since speech and natural language tasks often benefit from ensemble …
[PDF][PDF] Posterior probability decoding, confidence estimation and system combination
G Evermann, PC Woodland - Proc. Speech Transcription …, 2000 - mi.eng.cam.ac.uk
In this paper the estimation of word posterior probabilities is discussed and their application
in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech …
in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech …
Investigations on speech recognition systems for low-resource dialectal Arabic–English code-switching speech
Abstract Code-switching (CS), defined as the mixing of languages in conversations, has
become a worldwide phenomenon. The prevalence of CS has been recently met with a …
become a worldwide phenomenon. The prevalence of CS has been recently met with a …
Modeling prosodic feature sequences for speaker recognition
E Shriberg, L Ferrer, S Kajarekar, A Venkataraman… - Speech …, 2005 - Elsevier
We describe a novel approach to modeling idiosyncratic prosodic behavior for automatic
speaker recognition. The approach computes various duration, pitch, and energy features …
speaker recognition. The approach computes various duration, pitch, and energy features …