The SRI March 2000 Hub-5 conversational speech transcription system

W Xiong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …

被引用次数：951 相关文章所有 20 个版本

[PDF] arxiv.org

Achieving human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide, M Seltzer… - arXiv preprint arXiv …, 2016 - arxiv.org

Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …

被引用次数：712 相关文章所有 8 个版本

Toward human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org

Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human …

被引用次数：252 相关文章所有 3 个版本

[PDF] huggingface.co

[PDF][PDF] SRILM-an extensible language modeling toolkit.

A Stolcke - Interspeech, 2002 - huggingface.co

ABSTRACT SRILM is a collection of C++ libraries, executable programs, and helper scripts
designed to allow both production of and experimentation with statistical language models …

被引用次数：5844 相关文章所有 23 个版本

[PDF] arxiv.org

Finding consensus in speech recognition: word error minimization and other applications of confusion networks

L Mangu, E Brill, A Stolcke - Computer Speech & Language, 2000 - Elsevier

We describe a new framework for distilling information from word lattices to improve the
accuracy of the speech recognition output and obtain a more perspicuous representation of …

被引用次数：936 相关文章所有 15 个版本

[PDF] psu.edu

[PDF][PDF] Prosody-based automatic detection of annoyance and frustration in human-computer dialog.

J Ang, R Dhillon, A Krupski, E Shriberg, A Stolcke - INTERSPEECH, 2002 - Citeseer

We investigate the use of prosody for the detection of frustration and annoyance in natural
human-computer dialog. In addition to prosodic features, we examine the contribution of …

被引用次数：541 相关文章所有 21 个版本

[PDF] arxiv.org

Dover-lap: A method for combining overlap-aware diarization outputs

D Raj, LP Garcia-Perera, Z Huang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …

被引用次数：72 相关文章所有 10 个版本

[PDF] cam.ac.uk

[PDF][PDF] Posterior probability decoding, confidence estimation and system combination

G Evermann, PC Woodland - Proc. Speech Transcription …, 2000 - mi.eng.cam.ac.uk

In this paper the estimation of word posterior probabilities is discussed and their application
in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech …

被引用次数：317 相关文章所有 8 个版本

[PDF] arxiv.org

Investigations on speech recognition systems for low-resource dialectal Arabic–English code-switching speech

I Hamed, P Denisov, CY Li, M Elmahdy… - Computer Speech & …, 2022 - Elsevier

Abstract Code-switching (CS), defined as the mixing of languages in conversations, has
become a worldwide phenomenon. The prevalence of CS has been recently met with a …

被引用次数：43 相关文章所有 4 个版本

[PDF] psu.edu

Modeling prosodic feature sequences for speaker recognition

E Shriberg, L Ferrer, S Kajarekar, A Venkataraman… - Speech …, 2005 - Elsevier

We describe a novel approach to modeling idiosyncratic prosodic behavior for automatic
speaker recognition. The approach computes various duration, pitch, and energy features …

被引用次数：309 相关文章所有 11 个版本