[HTML][HTML] A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

Edge learning: The enabling technology for distributed big data analytics in the edge

J Zhang, Z Qu, C Chen, H Wang, Y Zhan, B Ye… - ACM Computing …, 2021 - dl.acm.org
Machine Learning (ML) has demonstrated great promise in various fields, eg, self-driving,
smart city, which are fundamentally altering the way individuals and organizations live, work …

Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}

L Ma, Z Xie, Z Yang, J Xue, Y Miao, W Cui… - … USENIX Symposium on …, 2020 - usenix.org
Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …

Flexible high-resolution object detection on edge devices with tunable latency

S Jiang, Z Lin, Y Li, Y Shu, Y Liu - Proceedings of the 27th Annual …, 2021 - dl.acm.org
Object detection is a fundamental building block of video analytics applications. While
Neural Networks (NNs)-based object detection models have shown excellent accuracy on …

Optimizing {CNN} model inference on {CPUs}

Y Liu, Y Wang, R Yu, M Li, V Sharma… - 2019 USENIX Annual …, 2019 - usenix.org
The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs
imply that better performance of CNN model inference on CPUs can deliver significant gain …

Enabling edge-cloud video analytics for robotics applications

Y Wang, W Wang, D Liu, X Jin, J Jiang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emerging deep learning-based video analytics tasks demand computation-intensive neural
networks and powerful computing resources on the cloud to achieve high accuracy. Due to …

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus

M Wang, S Ding, T Cao, Y Liu, F Xu - Proceedings of the 27th Annual …, 2021 - dl.acm.org
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …

DynaGraph: dynamic graph neural networks at scale

M Guan, AP Iyer, T Kim - Proceedings of the 5th ACM SIGMOD Joint …, 2022 - dl.acm.org
In this paper, we present DynaGraph, a system that supports dynamic Graph Neural
Networks (GNNs) efficiently. Based on the observation that existing proposals for dynamic …

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …