Recent developments in spoken term detection: a survey

A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer
Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …

Long short-term memory recurrent neural network based segment features for music genre classification

J Dai, S Liang, W Xue, C Ni… - 2016 10th International …, 2016 - ieeexplore.ieee.org
In the conventional frame feature based music genre classification methods, the audio data
is represented by independent frames and the sequential nature of audio is totally ignored. If …

Fast query-by-example speech search using attention-based deep binary embeddings

Y Yuan, L Xie, CC Leung, H Chen… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
State-of-the-art query-by-example (QbE) speech search approaches usually use recurrent
neural network (RNN) based acoustic word embeddings (AWEs) to represent variable …

Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context

Y Yuan, CC Leung, L Xie, H Chen, B Ma - IEEE Access, 2019 - ieeexplore.ieee.org
Acoustic word embeddings (AWEs) have been popular in low-resource query-by-example
speech search. They are using vector distances to find the spoken query in search content …

Artificial neural network for folk music style classification

Q Ning, J Shi - Mobile Information Systems, 2022 - Wiley Online Library
Folk music style classification is of great significance. Traditional folk music style
classification has difficulties in feature selection, and the existing folk music style methods …

Discriminative Keyword Spotting for limited-data applications

H Benisty, I Katz, K Crammer, D Malah - Speech Communication, 2018 - Elsevier
Mobile devices are widely used around the world, frequently by people speaking local
languages or dialects that are not well documented. For these languages, it might not be …

Unsupervised iterative Deep Learning of speech features and acoustic tokens with applications to spoken term detection

CT Chung, CY Tsai, CH Liu… - IEEE/ACM Transactions …, 2017 - ieeexplore.ieee.org
In this paper, we aim to automatically discover high-quality frame-level speech features and
acoustic tokens directly from unlabeled speech data. A multigranular acoustic tokenizer …

Zero resource speech synthesis using transcripts derived from perceptual acoustic units

HA Murthy - arXiv preprint arXiv:2006.04372, 2020 - arxiv.org
Zerospeech synthesis is the task of building vocabulary independent speech synthesis
systems, where transcriptions are not available for training data. It is, therefore, necessary to …

Acoustic unit discovery using transient and steady-state regions in speech and its applications

K Pandia, HA Murthy - Journal of Phonetics, 2021 - Elsevier
Acoustic modelling in the absence of labelled audio is difficult in speech processing,
especially in under-resourced languages. Ideas from theories of speech production and …

Unsupervised speech unit discovery using k-means and neural networks

C Manenti, T Pellegrini, J Pinquier - … SLSP 2017, Le Mans, France, October …, 2017 - Springer
Unsupervised discovery of sub-lexical units in speech is a problem that currently interests
speech researchers. In this paper, we report experiments in which we use phone …