Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

ESPnet: End-to-end speech processing toolkit

S Watanabe, T Hori, S Karita, T Hayashi… - arXiv preprint arXiv …, 2018 - arxiv.org
This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …

Thchs-30: A free chinese speech corpus

D Wang, X Zhang - arXiv preprint arXiv:1512.01882, 2015 - arxiv.org
Speech data is crucially important for speech recognition research. There are quite some
speech databases that can be purchased at prices that are reasonable for most research …

Data quality: The other face of big data

B Saha, D Srivastava - 2014 IEEE 30th international conference …, 2014 - ieeexplore.ieee.org
In our Big Data era, data is being generated, collected and analyzed at an unprecedented
scale, and data-driven decision making is sweeping through all aspects of society. Recent …

Building and evaluation of a real room impulse response dataset

I Szöke, M Skácel, L Mošner, J Paliesek… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
This paper presents BUT ReverbDB-a dataset of real room impulse responses (RIR),
background noises, and retransmitted speech data. The retransmitted data include …

Dover-lap: A method for combining overlap-aware diarization outputs

D Raj, LP Garcia-Perera, Z Huang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …

The CALO meeting assistant system

G Tur, A Stolcke, L Voss, S Peters… - … on Audio, Speech …, 2010 - ieeexplore.ieee.org
The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation,
automatic transcription and semantic analysis of multiparty meetings, and is part of the larger …

Unified architecture for multichannel end-to-end speech recognition with neural beamforming

T Ochiai, S Watanabe, T Hori… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
This paper proposes a unified architecture for end-to-end automatic speech recognition
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …

Recognition and understanding of meetings the AMI and AMIDA projects

S Renals, T Hain, H Bourlard - 2007 IEEE Workshop on …, 2007 - ieeexplore.ieee.org
The AMI and AMIDA projects are concerned with the recognition and interpretation of
multiparty meetings. Within these projects we have: developed an infrastructure for …

Multichannel end-to-end speech recognition

T Ochiai, S Watanabe, T Hori… - … conference on machine …, 2017 - proceedings.mlr.press
The field of speech recognition is in the midst of a paradigm shift: end-to-end neural
networks are challenging the dominance of hidden Markov models as a core technology …