Deep representation learning in speech processing: Challenges, recent advances, and future trends
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …
engineered acoustic features (feature engineering) as a separate distinct problem from the …
ESPnet: End-to-end speech processing toolkit
This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …
Thchs-30: A free chinese speech corpus
D Wang, X Zhang - arXiv preprint arXiv:1512.01882, 2015 - arxiv.org
Speech data is crucially important for speech recognition research. There are quite some
speech databases that can be purchased at prices that are reasonable for most research …
speech databases that can be purchased at prices that are reasonable for most research …
Data quality: The other face of big data
B Saha, D Srivastava - 2014 IEEE 30th international conference …, 2014 - ieeexplore.ieee.org
In our Big Data era, data is being generated, collected and analyzed at an unprecedented
scale, and data-driven decision making is sweeping through all aspects of society. Recent …
scale, and data-driven decision making is sweeping through all aspects of society. Recent …
Building and evaluation of a real room impulse response dataset
This paper presents BUT ReverbDB-a dataset of real room impulse responses (RIR),
background noises, and retransmitted speech data. The retransmitted data include …
background noises, and retransmitted speech data. The retransmitted data include …
Dover-lap: A method for combining overlap-aware diarization outputs
Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …
speaker diarization. Since speech and natural language tasks often benefit from ensemble …
The CALO meeting assistant system
The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation,
automatic transcription and semantic analysis of multiparty meetings, and is part of the larger …
automatic transcription and semantic analysis of multiparty meetings, and is part of the larger …
Unified architecture for multichannel end-to-end speech recognition with neural beamforming
T Ochiai, S Watanabe, T Hori… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
This paper proposes a unified architecture for end-to-end automatic speech recognition
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …
Recognition and understanding of meetings the AMI and AMIDA projects
The AMI and AMIDA projects are concerned with the recognition and interpretation of
multiparty meetings. Within these projects we have: developed an infrastructure for …
multiparty meetings. Within these projects we have: developed an infrastructure for …
Multichannel end-to-end speech recognition
T Ochiai, S Watanabe, T Hori… - … conference on machine …, 2017 - proceedings.mlr.press
The field of speech recognition is in the midst of a paradigm shift: end-to-end neural
networks are challenging the dominance of hidden Markov models as a core technology …
networks are challenging the dominance of hidden Markov models as a core technology …