Sustainable ai: Environmental implications, challenges and opportunities

CJ Wu, R Raghavendra, U Gupta… - Proceedings of …, 2022 - proceedings.mlsys.org
This paper explores the environmental impact of the super-linear growth trends for AI from a
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Tip-adapter: Training-free clip-adapter for better vision-language modeling

R Zhang, R Fang, W Zhang, P Gao, K Li, J Dai… - arXiv preprint arXiv …, 2021 - arxiv.org
Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for
learning visual representations by using large-scale contrastive image-text pairs. It shows …

Resmlp: Feedforward networks for image classification with data-efficient training

H Touvron, P Bojanowski, M Caron… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image
classification. It is a simple residual network that alternates (i) a linear layer in which image …

Medical transformer: Gated axial-attention for medical image segmentation

JMJ Valanarasu, P Oza, I Hacihaliloglu… - Medical image computing …, 2021 - Springer
Over the past decade, deep convolutional neural networks have been widely adopted for
medical image segmentation and shown to achieve adequate performance. However, due …

[PDF][PDF] Is space-time attention all you need for video understanding?

G Bertasius, H Wang, L Torresani - ICML, 2021 - proceedings.mlr.press
Training. We train our model for 15 epochs with an initial learning rate of 0.005, which is
divided by 10 at epochs 11, and 14. During training, we first resize the shorter side of the …

Differentially private fine-tuning of language models

D Yu, S Naik, A Backurs, S Gopi, HA Inan… - arXiv preprint arXiv …, 2021 - arxiv.org
We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-
scale pre-trained language models, which achieve the state-of-the-art privacy versus utility …

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020 - proceedings.neurips.cc
We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

Linformer: Self-attention with linear complexity

S Wang, BZ Li, M Khabsa, H Fang, H Ma - arXiv preprint arXiv:2006.04768, 2020 - arxiv.org
Large transformer models have shown extraordinary success in achieving state-of-the-art
results in many natural language processing applications. However, training and deploying …