Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

Continuous speech separation: Dataset and analysis

Z Chen, T Yoshioka, L Lu, T Zhou… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening

T Yoshioka, T Nakatani - IEEE Transactions on Audio, Speech …, 2012 - ieeexplore.ieee.org
The performance of many microphone array processing techniques deteriorates in the
presence of reverberation. To provide a widely applicable solution to this longstanding …

Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition

T Yoshioka, A Sehr, M Delcroix… - IEEE Signal …, 2012 - ieeexplore.ieee.org
Speech recognition technology has left the research laboratory and is increasingly coming
into practical use, enabling a wide spectrum of innovative and exciting voice-driven …

Audio user interaction recognition and application interface

LH Kim, J Shin, E Visser - US Patent 9,746,916, 2017 - Google Patents
Disclosed is an application interface that takes into account the user's gaze direction relative
to who is speaking in an interactive multi-participant environment where audio-based …

Advances in online audio-visual meeting transcription

T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

Method and apparatus for detecting speech endpoint using weighted finite state transducer

H Chung, S Lee, YK Lee - US Patent 9,396,722, 2016 - Google Patents
Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST.
The apparatus in accordance with an embodiment of the present invention includes: a …

Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR

T Higuchi, N Ito, S Araki, T Yoshioka… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
This paper considers acoustic beamforming for noise robust automatic speech recognition.
A beamformer attenuates background noise by enhancing sound components coming from …

Recognizing overlapped speech in meetings: A multichannel separation approach using neural networks

T Yoshioka, H Erdogan, Z Chen, X Xiao… - arXiv preprint arXiv …, 2018 - arxiv.org
The goal of this work is to develop a meeting transcription system that can recognize speech
even when utterances of different speakers are overlapped. While speech overlaps have …