[HTML][HTML] Impact of word embedding models on text analytics in deep learning environment: a review

DS Asudani, NK Nagwani, P Singh - Artificial intelligence review, 2023 - Springer
The selection of word embedding and deep learning models for better outcomes is vital.
Word embeddings are an n-dimensional distributed representation of a text that attempts to …

[HTML][HTML] Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Unispeech: Unified speech representation learning with labeled and unlabeled data

C Wang, Y Wu, Y Qian, K Kumatani… - International …, 2021 - proceedings.mlr.press
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech
representations with both labeled and unlabeled data, in which supervised phonetic CTC …

Towards end-to-end unsupervised speech recognition

AH Liu, WN Hsu, M Auli… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

A comprehensive survey on multi-modal conversational emotion recognition with deep learning

Y Shou, T Meng, W Ai, N Yin, K Li - arXiv preprint arXiv:2312.05735, 2023 - arxiv.org
Multi-modal conversation emotion recognition (MCER) aims to recognize and track the
speaker's emotional state using text, speech, and visual information in the conversation …

A deep-learning based citation count prediction model with paper metadata semantic features

A Ma, Y Liu, X Xu, T Dong - Scientometrics, 2021 - Springer
Predicting the impact of academic papers can help scholars quickly identify the high-quality
papers in the field. How to develop efficient predictive model for evaluating potential papers …

MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets

Z Ma, Z Zheng, C Tang, Y Wang, X Chen - arXiv preprint arXiv:2211.07321, 2022 - arxiv.org
In this paper, we provide a new perspective on self-supervised speech models from how the
self-training targets are obtained. We generalize the targets extractor into Offline Targets …