The Microsoft 2017 conversational speech recognition system

W Xiong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …

Achieving human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide, M Seltzer… - arXiv preprint arXiv …, 2016 - arxiv.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …

Toward human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human …

[PDF][PDF] SRILM-an extensible language modeling toolkit.

A Stolcke - Interspeech, 2002 - huggingface.co
ABSTRACT SRILM is a collection of C++ libraries, executable programs, and helper scripts
designed to allow both production of and experimentation with statistical language models …

Finding consensus in speech recognition: word error minimization and other applications of confusion networks

L Mangu, E Brill, A Stolcke - Computer Speech & Language, 2000 - Elsevier
We describe a new framework for distilling information from word lattices to improve the
accuracy of the speech recognition output and obtain a more perspicuous representation of …

[PDF][PDF] Prosody-based automatic detection of annoyance and frustration in human-computer dialog.

J Ang, R Dhillon, A Krupski, E Shriberg, A Stolcke - INTERSPEECH, 2002 - Citeseer
We investigate the use of prosody for the detection of frustration and annoyance in natural
human-computer dialog. In addition to prosodic features, we examine the contribution of …

Dover-lap: A method for combining overlap-aware diarization outputs

D Raj, LP Garcia-Perera, Z Huang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …

[PDF][PDF] Posterior probability decoding, confidence estimation and system combination

G Evermann, PC Woodland - Proc. Speech Transcription …, 2000 - mi.eng.cam.ac.uk
In this paper the estimation of word posterior probabilities is discussed and their application
in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech …

Investigations on speech recognition systems for low-resource dialectal Arabic–English code-switching speech

I Hamed, P Denisov, CY Li, M Elmahdy… - Computer Speech & …, 2022 - Elsevier
Abstract Code-switching (CS), defined as the mixing of languages in conversations, has
become a worldwide phenomenon. The prevalence of CS has been recently met with a …

Modeling prosodic feature sequences for speaker recognition

E Shriberg, L Ferrer, S Kajarekar, A Venkataraman… - Speech …, 2005 - Elsevier
We describe a novel approach to modeling idiosyncratic prosodic behavior for automatic
speaker recognition. The approach computes various duration, pitch, and energy features …