Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
A survey of techniques for optimizing deep learning on GPUs
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …
its unique features, the GPU continues to remain the most widely used accelerator for DL …
Hardnet: A low memory traffic network
P Chao, CY Kao, YS Ruan… - Proceedings of the …, 2019 - openaccess.thecvf.com
State-of-the-art neural network architectures such as ResNet, MobileNet, and DenseNet
have achieved outstanding accuracy over low MACs and small model size counterparts …
have achieved outstanding accuracy over low MACs and small model size counterparts …
TASO: optimizing deep learning computation with automatic generation of graph substitutions
Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN
by applying graph transformations manually designed by human experts. This approach …
by applying graph transformations manually designed by human experts. This approach …
Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus
Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices
performing data analytics right at the source, reducing latency as well as energy …
performing data analytics right at the source, reducing latency as well as energy …
Data movement is all you need: A case study on optimizing transformers
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …
is a very compute-intensive task, often taking days or weeks, and significant attention has …
Ultrafast dynamic machine vision with spatiotemporal photonic computing
Ultrafast dynamic machine vision in the optical domain can provide unprecedented
perspectives for high-performance computing. However, owing to the limited degrees of …
perspectives for high-performance computing. However, owing to the limited degrees of …
fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs
SI Venieris, CS Bouganis - IEEE transactions on neural …, 2018 - ieeexplore.ieee.org
Since neural networks renaissance, convolutional neural networks (ConvNets) have
demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks …
demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks …
Transfer learning for sEMG hand gestures recognition using convolutional neural networks
U Côté-Allard, CL Fall… - … on Systems, Man …, 2017 - ieeexplore.ieee.org
In the realm of surface electromyography (sEMG) gesture recognition, deep learning
algorithms are seldom employed. This is due in part to the large quantity of data required for …
algorithms are seldom employed. This is due in part to the large quantity of data required for …
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections
High-performance tensor programs are critical for efficiently deploying deep neural network
(DNN) models in real-world tasks. Existing frameworks optimize tensor programs by …
(DNN) models in real-world tasks. Existing frameworks optimize tensor programs by …