Audio word2vec: Sequence-to-sequence autoencoding for unsupervised learning of audio segmentation...

[HTML][HTML] Impact of word embedding models on text analytics in deep learning environment: a review

DS Asudani, NK Nagwani, P Singh - Artificial intelligence review, 2023 - Springer

The selection of word embedding and deep learning models for better outcomes is vital.
Word embeddings are an n-dimensional distributed representation of a text that attempts to …

被引用次数：65 相关文章所有 7 个版本

[HTML] cell.com Full View

[HTML][HTML] Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

被引用次数：93 相关文章所有 12 个版本

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

被引用次数：1194 相关文章所有 5 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：80 相关文章所有 6 个版本

[PDF] mlr.press

Unispeech: Unified speech representation learning with labeled and unlabeled data

C Wang, Y Wu, Y Qian, K Kumatani… - International …, 2021 - proceedings.mlr.press

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech
representations with both labeled and unlabeled data, in which supervised phonetic CTC …

被引用次数：117 相关文章所有 4 个版本

[PDF] arxiv.org

Towards end-to-end unsupervised speech recognition

AH Liu, WN Hsu, M Auli… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …

被引用次数：65 相关文章所有 3 个版本

[PDF] arxiv.org

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

被引用次数：66 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive survey on multi-modal conversational emotion recognition with deep learning

Y Shou, T Meng, W Ai, N Yin, K Li - arXiv preprint arXiv:2312.05735, 2023 - arxiv.org

Multi-modal conversation emotion recognition (MCER) aims to recognize and track the
speaker's emotional state using text, speech, and visual information in the conversation …

被引用次数：10 相关文章所有 2 个版本

A deep-learning based citation count prediction model with paper metadata semantic features

A Ma, Y Liu, X Xu, T Dong - Scientometrics, 2021 - Springer

Predicting the impact of academic papers can help scholars quickly identify the high-quality
papers in the field. How to develop efficient predictive model for evaluating potential papers …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets

Z Ma, Z Zheng, C Tang, Y Wang, X Chen - arXiv preprint arXiv:2211.07321, 2022 - arxiv.org

In this paper, we provide a new perspective on self-supervised speech models from how the
self-training targets are obtained. We generalize the targets extractor into Offline Targets …

被引用次数：18 相关文章所有 4 个版本