Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

Hardnet: A low memory traffic network

P Chao, CY Kao, YS Ruan… - Proceedings of the …, 2019 - openaccess.thecvf.com
State-of-the-art neural network architectures such as ResNet, MobileNet, and DenseNet
have achieved outstanding accuracy over low MACs and small model size counterparts …

TASO: optimizing deep learning computation with automatic generation of graph substitutions

Z Jia, O Padon, J Thomas, T Warszawski… - Proceedings of the 27th …, 2019 - dl.acm.org
Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN
by applying graph transformations manually designed by human experts. This approach …

Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus

L Lai, N Suda, V Chandra - arXiv preprint arXiv:1801.06601, 2018 - arxiv.org
Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices
performing data analytics right at the source, reducing latency as well as energy …

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

Ultrafast dynamic machine vision with spatiotemporal photonic computing

T Zhou, W Wu, J Zhang, S Yu, L Fang - Science Advances, 2023 - science.org
Ultrafast dynamic machine vision in the optical domain can provide unprecedented
perspectives for high-performance computing. However, owing to the limited degrees of …

fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs

SI Venieris, CS Bouganis - IEEE transactions on neural …, 2018 - ieeexplore.ieee.org
Since neural networks renaissance, convolutional neural networks (ConvNets) have
demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks …

Transfer learning for sEMG hand gestures recognition using convolutional neural networks

U Côté-Allard, CL Fall… - … on Systems, Man …, 2017 - ieeexplore.ieee.org
In the realm of surface electromyography (sEMG) gesture recognition, deep learning
algorithms are seldom employed. This is due in part to the large quantity of data required for …

{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections

H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng… - … USENIX Symposium on …, 2021 - usenix.org
High-performance tensor programs are critical for efficiently deploying deep neural network
(DNN) models in real-world tasks. Existing frameworks optimize tensor programs by …