Locally-connected and convolutional neural networks for small footprint speaker recognition

F Wu, A Fan, A Baevski, YN Dauphin, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org

Self-attention is a useful mechanism to build generative models for language and images. It
determines the importance of context elements by comparing each element to the current …

被引用次数：655 相关文章所有 7 个版本

[PDF] arxiv.org

Generalized end-to-end loss for speaker verification

L Wan, Q Wang, A Papir… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss,
which makes the training of speaker verification models more efficient than our previous …

被引用次数：1000 相关文章所有 10 个版本

[PDF] arxiv.org

Speaker diarization with LSTM

Q Wang, C Downey, L Wan… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

For many years, i-vector based audio embedding techniques were the dominant approach
for speaker verification and speaker diarization applications. However, mirroring the rise of …

被引用次数：399 相关文章所有 11 个版本

[PDF] arxiv.org

End-to-end text-dependent speaker verification

G Heigold, I Moreno, S Bengio… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

In this paper we present a data-driven, integrated approach to speaker verification, which
maps a test utterance and a few reference utterances directly to a single score for verification …

被引用次数：773 相关文章所有 14 个版本

[HTML] mdpi.com

[HTML][HTML] A deep neural network model for speaker identification

F Ye, J Yang - Applied Sciences, 2021 - mdpi.com

Speaker identification is a classification task which aims to identify a subject from a given
time-series sequential data. Since the speech signal is a continuous one-dimensional time …

被引用次数：95 相关文章所有 5 个版本

[PDF] arxiv.org

Personalized speech recognition on mobile devices

I McGraw, R Prabhavalkar, R Alvarez… - … , Speech and Signal …, 2016 - ieeexplore.ieee.org

We describe a large vocabulary speech recognition system that is accurate, has low latency,
and yet has a small enough memory and computational footprint to run faster than real-time …

被引用次数：221 相关文章所有 8 个版本

[PDF] arxiv.org

Trainable frontend for robust and far-field keyword spotting

Y Wang, P Getreuer, T Hughes, RF Lyon… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org

Robust and far-field speech recognition is critical to enable true hands-free communication.
In far-field conditions, signals are attenuated due to distance. To improve robustness to …

被引用次数：164 相关文章所有 9 个版本

[PDF] cam.ac.uk

Convolutional CRFs for semantic segmentation

MTT Teichmann, R Cipolla - arXiv preprint arXiv:1805.04777, 2018 - arxiv.org

For the challenging semantic image segmentation task the most efficient models have
traditionally combined the structured modelling capabilities of Conditional Random Fields …

被引用次数：135 相关文章所有 8 个版本

[PDF] acm.org

Robust detection of machine-induced audio attacks in intelligent audio systems with microphone array

Z Li, C Shi, T Zhang, Y Xie, J Liu, B Yuan… - Proceedings of the 2021 …, 2021 - dl.acm.org

With the popularity of intelligent audio systems in recent years, their vulnerabilities have
become an increasing public concern. Existing studies have designed a set of machine …

被引用次数：30 相关文章所有 12 个版本

[PDF] aps.org

Deeplss: Breaking parameter degeneracies in large-scale structure with deep-learning analysis of combined probes

T Kacprzak, J Fluri - Physical Review X, 2022 - APS

In classical cosmological analysis of large-scale structure surveys with two-point functions,
the parameter measurement precision is limited by several key degeneracies within the …

被引用次数：20 相关文章所有 7 个版本