A manifesto for future generation cloud computing: Research directions for the next decade

R Buyya, SN Srirama, G Casale, R Calheiros… - ACM computing …, 2018 - dl.acm.org
The Cloud computing paradigm has revolutionised the computer science horizon during the
past decade and has enabled the emergence of computing as the fifth utility. It has captured …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

[PDF][PDF] ProtTrans: towards cracking the language of life's code through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models taken from NLP. These LMs reach for new prediction …

Large batch training of convolutional networks

Y You, I Gitman, B Ginsburg - arXiv preprint arXiv:1708.03888, 2017 - arxiv.org
The most natural way to speed-up the training of large networks is to use data-parallelism on
multiple GPUs. To scale Stochastic Gradient (SG) based methods to more processors, one …

Drawing early-bird tickets: Towards more efficient training of deep networks

H You, C Li, P Xu, Y Fu, Y Wang, X Chen… - arXiv preprint arXiv …, 2019 - arxiv.org
(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical
subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve …

The next generation of deep learning hardware: Analog computing

W Haensch, T Gokmen, R Puri - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
Initially developed for gaming and 3-D rendering, graphics processing units (GPUs) were
recognized to be a good fit to accelerate deep learning training. Its simple mathematical …

Tictac: Accelerating distributed deep learning with communication scheduling

SH Hashemi, S Abdu Jyothi… - … of Machine Learning …, 2019 - proceedings.mlsys.org
State-of-the-art deep learning systems rely on iterative distributed training to tackle the
increasing complexity of models and input data. In this work, we identify an opportunity for …

Adacomp: Adaptive residual gradient compression for data-parallel distributed training

CY Chen, J Choi, D Brand, A Agrawal… - Proceedings of the …, 2018 - ojs.aaai.org
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms
(offering 100 of TeraOps/s of computational capacity) is expected to be severely …

Anatomy of high-performance deep learning convolutions on simd architectures

E Georganas, S Avancha, K Banerjee… - … Conference for High …, 2018 - ieeexplore.ieee.org
Convolution layers are prevalent in many classes of deep neural networks, including
Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like …

E2-train: Training state-of-the-art cnns with over 80% energy savings

Y Wang, Z Jiang, X Chen, P Xu… - Advances in Neural …, 2019 - proceedings.neurips.cc
Convolutional neural networks (CNNs) have been increasingly deployed to edge devices.
Hence, many efforts have been made towards efficient CNN inference on resource …