A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

Dphubert: Joint distillation and pruning of self-supervised speech models

Y Peng, Y Sudo, S Muhammad, S Watanabe - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has achieved notable success in many speech processing
tasks, but the large model size and heavy computational cost hinder the deployment …

Dinosr: Self-distillation and online clustering for self-supervised speech representation learning

AH Liu, HJ Chang, M Auli, WN Hsu… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …

Fithubert: Going thinner and deeper for knowledge distillation of speech self-supervised learning

Y Lee, K Jang, J Goo, Y Jung, H Kim - arXiv preprint arXiv:2207.00555, 2022 - arxiv.org
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech
processing, however, the problem of computational cost arising from its vast size makes a …

Superb@ slt 2022: Challenge on generalization and efficiency of self-supervised speech representation learning

T Feng, A Dong, CF Yeh, S Yang, TQ Lin… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised
speech representation for better performance, generalization, and efficiency. The challenge …

Prenas: Preferred one-shot learning towards efficient neural architecture search

H Wang, C Ge, H Chen, X Sun - International Conference on …, 2023 - proceedings.mlr.press
The wide application of pre-trained models is driving the trend of once-for-all training in one-
shot neural architecture search (NAS). However, training within a huge sample space …

Reducing barriers to self-supervised learning: Hubert pre-training with academic compute

W Chen, X Chang, Y Peng, Z Ni, S Maiti… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has led to great strides in speech processing. However, the
resources needed to train these models has become prohibitively large as they continue to …

Speechclip: Integrating speech with pre-trained vision and language model

YJ Shih, HF Wang, HJ Chang, L Berry… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Data-driven speech processing models usually perform well with a large amount of text
supervision, but collecting transcribed speech data is costly. Therefore, we propose Speech …

Structured pruning of self-supervised pre-trained models for speech recognition and understanding

Y Peng, K Kim, F Wu, P Sridhar… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Self-supervised speech representation learning (SSL) has shown to be effective in various
downstream tasks, but SSL models are usually large and slow. Model compression …