[HTML][HTML] A survey of techniques for optimizing deep learning on GPUs
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …
its unique features, the GPU continues to remain the most widely used accelerator for DL …
Edge learning: The enabling technology for distributed big data analytics in the edge
Machine Learning (ML) has demonstrated great promise in various fields, eg, self-driving,
smart city, which are fundamentally altering the way individuals and organizations live, work …
smart city, which are fundamentally altering the way individuals and organizations live, work …
Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}
Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …
Flexible high-resolution object detection on edge devices with tunable latency
Object detection is a fundamental building block of video analytics applications. While
Neural Networks (NNs)-based object detection models have shown excellent accuracy on …
Neural Networks (NNs)-based object detection models have shown excellent accuracy on …
Optimizing {CNN} model inference on {CPUs}
The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs
imply that better performance of CNN model inference on CPUs can deliver significant gain …
imply that better performance of CNN model inference on CPUs can deliver significant gain …
Enabling edge-cloud video analytics for robotics applications
Emerging deep learning-based video analytics tasks demand computation-intensive neural
networks and powerful computing resources on the cloud to achieve high accuracy. Due to …
networks and powerful computing resources on the cloud to achieve high accuracy. Due to …
{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …
most common hardware for on-device inference and many inference frameworks have been …
DynaGraph: dynamic graph neural networks at scale
In this paper, we present DynaGraph, a system that supports dynamic Graph Neural
Networks (GNNs) efficiently. Based on the observation that existing proposals for dynamic …
Networks (GNNs) efficiently. Based on the observation that existing proposals for dynamic …
A survey of deep learning on cpus: opportunities and co-optimizations
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …