Streaming small-footprint keyword spotting using sequence-to-sequence models

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：100 相关文章所有 6 个版本

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss

Q Zhang, H Lu, H Sak, A Tripathi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …

被引用次数：490 相关文章所有 6 个版本

[PDF] arxiv.org

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

被引用次数：709 相关文章所有 9 个版本

[PDF] ieee.org

Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

被引用次数：104 相关文章所有 7 个版本

[PDF] arxiv.org

Deep context: end-to-end contextual speech recognition

G Pundak, TN Sainath, R Prabhavalkar… - 2018 IEEE spoken …, 2018 - ieeexplore.ieee.org

In automatic speech recognition (ASR) what a user says depends on the particular context
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …

被引用次数：186 相关文章所有 6 个版本

[PDF] arxiv.org

Contextual RNN-T for open domain ASR

M Jain, G Keren, J Mahadeokar, G Zweig… - arXiv preprint arXiv …, 2020 - arxiv.org

End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …

被引用次数：93 相关文章所有 7 个版本

[PDF] arxiv.org

Attention-based end-to-end models for small-footprint keyword spotting

C Shan, J Zhang, Y Wang, L Xie - arXiv preprint arXiv:1803.10916, 2018 - arxiv.org

In this paper, we propose an attention-based end-to-end neural approach for small-footprint
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …

被引用次数：118 相关文章所有 7 个版本

[PDF] arxiv.org

Learning efficient representations for keyword spotting with triplet loss

R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer

In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …

被引用次数：52 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] A systematic review on sequence-to-sequence learning with neural network and its models.

H Yousuf, M Lahzi, SA Salloum… - International Journal of …, 2021 - researchgate.net

We develop a precise writing survey on sequence-to-sequence learning with neural network
and its models. The primary aim of this report is to enhance the knowledge of the sequence …

被引用次数：39 相关文章所有 7 个版本