Scaling neural machine translation

CJ Wu, R Raghavendra, U Gupta… - Proceedings of …, 2022 - proceedings.mlsys.org

This paper explores the environmental impact of the super-linear growth trends for AI from a
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …

被引用次数：415 相关文章所有 7 个版本

[PDF] arxiv.org

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

被引用次数：2496 相关文章所有 8 个版本

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

被引用次数：690 相关文章所有 6 个版本

[PDF] arxiv.org

Tip-adapter: Training-free clip-adapter for better vision-language modeling

R Zhang, R Fang, W Zhang, P Gao, K Li, J Dai… - arXiv preprint arXiv …, 2021 - arxiv.org

Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for
learning visual representations by using large-scale contrastive image-text pairs. It shows …

被引用次数：332 相关文章所有 2 个版本

[PDF] openreview.net

Resmlp: Feedforward networks for image classification with data-efficient training

H Touvron, P Bojanowski, M Caron… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image
classification. It is a simple residual network that alternates (i) a linear layer in which image …

被引用次数：543 相关文章所有 10 个版本

[PDF] arxiv.org

Medical transformer: Gated axial-attention for medical image segmentation

JMJ Valanarasu, P Oza, I Hacihaliloglu… - Medical image computing …, 2021 - Springer

Over the past decade, deep convolutional neural networks have been widely adopted for
medical image segmentation and shown to achieve adequate performance. However, due …

被引用次数：1137 相关文章所有 9 个版本

[PDF] mlr.press

[PDF][PDF] Is space-time attention all you need for video understanding?

G Bertasius, H Wang, L Torresani - ICML, 2021 - proceedings.mlr.press

Training. We train our model for 15 epochs with an initial learning rate of 0.005, which is
divided by 10 at epochs 11, and 14. During training, we first resize the shorter side of the …

被引用次数：2042 相关文章所有 4 个版本

[PDF] arxiv.org

Differentially private fine-tuning of language models

D Yu, S Naik, A Backurs, S Gopi, HA Inan… - arXiv preprint arXiv …, 2021 - arxiv.org

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-
scale pre-trained language models, which achieve the state-of-the-art privacy versus utility …

被引用次数：269 相关文章所有 3 个版本

[PDF] neurips.cc

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020 - proceedings.neurips.cc

We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

被引用次数：5423 相关文章所有 11 个版本

[PDF] arxiv.org

Linformer: Self-attention with linear complexity

S Wang, BZ Li, M Khabsa, H Fang, H Ma - arXiv preprint arXiv:2006.04768, 2020 - arxiv.org

Large transformer models have shown extraordinary success in achieving state-of-the-art
results in many natural language processing applications. However, training and deploying …

被引用次数：1642 相关文章所有 3 个版本