Pay less attention with lightweight and dynamic convolutions

F Wu, A Fan, A Baevski, YN Dauphin, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org
Self-attention is a useful mechanism to build generative models for language and images. It
determines the importance of context elements by comparing each element to the current …

Generalized end-to-end loss for speaker verification

L Wan, Q Wang, A Papir… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss,
which makes the training of speaker verification models more efficient than our previous …

Speaker diarization with LSTM

Q Wang, C Downey, L Wan… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
For many years, i-vector based audio embedding techniques were the dominant approach
for speaker verification and speaker diarization applications. However, mirroring the rise of …

End-to-end text-dependent speaker verification

G Heigold, I Moreno, S Bengio… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
In this paper we present a data-driven, integrated approach to speaker verification, which
maps a test utterance and a few reference utterances directly to a single score for verification …

[HTML][HTML] A deep neural network model for speaker identification

F Ye, J Yang - Applied Sciences, 2021 - mdpi.com
Speaker identification is a classification task which aims to identify a subject from a given
time-series sequential data. Since the speech signal is a continuous one-dimensional time …

Personalized speech recognition on mobile devices

I McGraw, R Prabhavalkar, R Alvarez… - … , Speech and Signal …, 2016 - ieeexplore.ieee.org
We describe a large vocabulary speech recognition system that is accurate, has low latency,
and yet has a small enough memory and computational footprint to run faster than real-time …

Trainable frontend for robust and far-field keyword spotting

Y Wang, P Getreuer, T Hughes, RF Lyon… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Robust and far-field speech recognition is critical to enable true hands-free communication.
In far-field conditions, signals are attenuated due to distance. To improve robustness to …

Convolutional CRFs for semantic segmentation

MTT Teichmann, R Cipolla - arXiv preprint arXiv:1805.04777, 2018 - arxiv.org
For the challenging semantic image segmentation task the most efficient models have
traditionally combined the structured modelling capabilities of Conditional Random Fields …

Robust detection of machine-induced audio attacks in intelligent audio systems with microphone array

Z Li, C Shi, T Zhang, Y Xie, J Liu, B Yuan… - Proceedings of the 2021 …, 2021 - dl.acm.org
With the popularity of intelligent audio systems in recent years, their vulnerabilities have
become an increasing public concern. Existing studies have designed a set of machine …

Deeplss: Breaking parameter degeneracies in large-scale structure with deep-learning analysis of combined probes

T Kacprzak, J Fluri - Physical Review X, 2022 - APS
In classical cosmological analysis of large-scale structure surveys with two-point functions,
the parameter measurement precision is limited by several key degeneracies within the …