Data-selective transfer learning for multi-domain speech recognition

M Doulaty, O Saz, T Hain - arXiv preprint arXiv:1509.02409, 2015 - arxiv.org
Negative transfer in training of acoustic models for automatic speech recognition has been
reported in several contexts such as domain change or speaker characteristics. This paper …

Automatic genre and show identification of broadcast media

M Doulaty, O Saz, RWM Ng, T Hain - arXiv preprint arXiv:1606.03333, 2016 - arxiv.org
Huge amounts of digital videos are being produced and broadcast every day, leading to
giant media archives. Effective techniques are needed to make such data accessible further …

Latent dirichlet allocation based organisation of broadcast media archives for deep neural network adaptation

M Doulaty, O Saz, RWM Ng… - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org
This paper presents a new method for the discovery of latent domains in diverse speech
data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech …

Unsupervised domain discovery using latent dirichlet allocation for acoustic modelling in speech recognition

M Doulaty, O Saz, T Hain - arXiv preprint arXiv:1509.02412, 2015 - arxiv.org
Speech recognition systems are often highly domain dependent, a fact widely reported in
the literature. However the concept of domain is complex and not bound to clear criteria …

Group feature selection for audio-based video genre classification

G Sageder, M Zaharieva, C Breiteneder - … 2016, Miami, FL, USA, January 4 …, 2016 - Springer
The performance of video genre classification approaches strongly depends on the selected
feature set. Feature selection requires for expert knowledge and is commonly driven by the …

GenSpecVidOnt: a reference ontology for knowledge based video analytics with multimodal genre detection

MU Sreeja, BC Kovoor - Multimedia Tools and Applications, 2023 - Springer
Video analytics refers to the process of automatically analysing a video for spatial and
temporal events. Effective video analytics require exploitation of genre-related information …

[PDF][PDF] webASR 2-Improved Cloud Based Speech Technology.

T Hain, J Christian, O Saz, S Deena, M Hasan… - …, 2016 - academia.edu
This paper presents the most recent developments of the webASR service (www. webasr.
org), the world's first web–based fully functioning automatic speech recognition platform for …

Color-independent classification of animation video

R Zumer, S Ratté - International Journal of Multimedia Information …, 2018 - Springer
This paper presents a method for the classification of animated video that does not rely on
hue or saturation information, and aims to achieve a high level of performance in the context …

[HTML][HTML] Acoustic adaptation to dynamic background conditions with asynchronous transformations

O Saz, T Hain - Computer Speech & Language, 2017 - Elsevier
This paper proposes a framework for performing adaptation to complex and non-stationary
background conditions in Automatic Speech Recognition (ASR) by means of asynchronous …

Emotion recognition from the speech signal by effective combination of generative and discriminative models

E Loweimi, M Doulaty, J Barker, T Hain - 2015 - eprints.whiterose.ac.uk
In this paper, we propose an effective way for combining the discriminative and generative
models for emotion recognition from speech signal. Finding an efficient feature extraction …