[HTML][HTML] Impact of word embedding models on text analytics in deep learning environment: a review
The selection of word embedding and deep learning models for better outcomes is vital.
Word embeddings are an n-dimensional distributed representation of a text that attempts to …
Word embeddings are an n-dimensional distributed representation of a text that attempts to …
[HTML][HTML] Audio self-supervised learning: A survey
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …
learning (SSL) targets discovering general representations from large-scale data. This …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Unispeech: Unified speech representation learning with labeled and unlabeled data
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech
representations with both labeled and unlabeled data, in which supervised phonetic CTC …
representations with both labeled and unlabeled data, in which supervised phonetic CTC …
Towards end-to-end unsupervised speech recognition
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …
Recognition (ASR) systems accessible to every language. However, existing methods still …
CDPAM: Contrastive learning for perceptual audio similarity
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
A comprehensive survey on multi-modal conversational emotion recognition with deep learning
Multi-modal conversation emotion recognition (MCER) aims to recognize and track the
speaker's emotional state using text, speech, and visual information in the conversation …
speaker's emotional state using text, speech, and visual information in the conversation …
A deep-learning based citation count prediction model with paper metadata semantic features
A Ma, Y Liu, X Xu, T Dong - Scientometrics, 2021 - Springer
Predicting the impact of academic papers can help scholars quickly identify the high-quality
papers in the field. How to develop efficient predictive model for evaluating potential papers …
papers in the field. How to develop efficient predictive model for evaluating potential papers …
MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets
In this paper, we provide a new perspective on self-supervised speech models from how the
self-training targets are obtained. We generalize the targets extractor into Offline Targets …
self-training targets are obtained. We generalize the targets extractor into Offline Targets …