A comparative study on transformer vs rnn in speech applications

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：311 相关文章所有 6 个版本

[PDF] arxiv.org

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：86 相关文章所有 6 个版本

[PDF] thecvf.com

Images speak in images: A generalist painter for in-context visual learning

X Wang, W Wang, Y Cao, C Shen… - Proceedings of the …, 2023 - openaccess.thecvf.com

In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various
tasks with only a handful of prompts and examples. But in computer vision, the difficulties for …

被引用次数：141 相关文章所有 5 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：552 相关文章所有 5 个版本

[PDF] thecvf.com

Going deeper with image transformers

H Touvron, M Cord, A Sablayrolles… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers have been recently adapted for large scale image classification, achieving
high scores shaking up the long supremacy of convolutional neural networks. However the …

被引用次数：970 相关文章所有 6 个版本

[PDF] thecvf.com

Ai choreographer: Music conditioned 3d dance generation with aist++

R Li, S Yang, DA Ross… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …

被引用次数：373 相关文章所有 7 个版本

[PDF] arxiv.org

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

被引用次数：2724 相关文章所有 12 个版本

[PDF] arxiv.org

Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

被引用次数：457 相关文章所有 8 个版本

[PDF] mlr.press

Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding

Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press

Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …

被引用次数：100 相关文章所有 8 个版本

[PDF] arxiv.org

Recent developments on espnet toolkit boosted by conformer

P Guo, F Boyer, X Chang, T Hayashi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …

被引用次数：271 相关文章所有 9 个版本