Low-latency real-time meeting recognition and understanding using distant microphones and...

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

被引用次数：443 相关文章所有 7 个版本

[PDF] arxiv.org

Continuous speech separation: Dataset and analysis

Z Chen, T Yoshioka, L Lu, T Zhou… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …

被引用次数：233 相关文章所有 3 个版本

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

被引用次数：116 相关文章所有 8 个版本

[PDF] audiolabs-erlangen.de

Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening

T Yoshioka, T Nakatani - IEEE Transactions on Audio, Speech …, 2012 - ieeexplore.ieee.org

The performance of many microphone array processing techniques deteriorates in the
presence of reverberation. To provide a widely applicable solution to this longstanding …

被引用次数：321 相关文章所有 7 个版本

Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition

T Yoshioka, A Sehr, M Delcroix… - IEEE Signal …, 2012 - ieeexplore.ieee.org

Speech recognition technology has left the research laboratory and is increasingly coming
into practical use, enabling a wide spectrum of innovative and exciting voice-driven …

被引用次数：340 相关文章所有 5 个版本

[PDF] googleapis.com

Audio user interaction recognition and application interface

LH Kim, J Shin, E Visser - US Patent 9,746,916, 2017 - Google Patents

Disclosed is an application interface that takes into account the user's gaze direction relative
to who is speaking in an interactive multi-participant environment where audio-based …

被引用次数：250 相关文章所有 4 个版本

[PDF] arxiv.org

Advances in online audio-visual meeting transcription

T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

被引用次数：90 相关文章所有 6 个版本

[PDF] googleapis.com

Method and apparatus for detecting speech endpoint using weighted finite state transducer

H Chung, S Lee, YK Lee - US Patent 9,396,722, 2016 - Google Patents

Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST.
The apparatus in accordance with an embodiment of the present invention includes: a …

被引用次数：164 相关文章所有 4 个版本

Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR

T Higuchi, N Ito, S Araki, T Yoshioka… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org

This paper considers acoustic beamforming for noise robust automatic speech recognition.
A beamformer attenuates background noise by enhancing sound components coming from …

被引用次数：129 相关文章所有 3 个版本

[PDF] arxiv.org

Recognizing overlapped speech in meetings: A multichannel separation approach using neural networks

T Yoshioka, H Erdogan, Z Chen, X Xiao… - arXiv preprint arXiv …, 2018 - arxiv.org

The goal of this work is to develop a meeting transcription system that can recognize speech
even when utterances of different speakers are overlapped. While speech overlaps have …

被引用次数：101 相关文章所有 6 个版本