Deadline-based scheduling for GPU with preemption support

N Capodieci, R Cavicchioli, M Bertogna… - 2018 IEEE Real …, 2018 - ieeexplore.ieee.org
Modern automotive-grade embedded computing platforms feature high-performance
Graphics Processing Units (GPUs) to support the massively parallel processing power …

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q Jin, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

Flep: Enabling flexible and efficient preemption on gpus

B Wu, X Liu, X Zhou, C Jiang - ACM SIGPLAN Notices, 2017 - dl.acm.org
GPUs are widely adopted in HPC and cloud computing platforms to accelerate general-
purpose workloads. However, modern GPUs do not support flexible preemption, leading to …

The hpc-dag task model for heterogeneous real-time systems

Z Houssam-Eddine, N Capodieci… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
Recent commercial hardware platforms for embedded real-time systems feature
heterogeneous processing units and computing accelerators on the same System-on-Chip …

Secure and timely gpu execution in cyber-physical systems

J Wang, Y Wang, N Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org
Graphics Processing Units (GPU) are increasingly deployed on Cyber-physical Systems
(CPSs), frequently used to perform real-time safety-critical functions, such as object …

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

B Taylor, VS Marco, Z Wang - ACM SIGPLAN Notices, 2017 - dl.acm.org
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in
today's embedded systems. These architectures offer potential for energy efficient computing …

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

W Zhang, Q Chen, K Fu, N Zheng, Z Huang… - Proceedings of the 27th …, 2022 - dl.acm.org
Multi-stage user-facing applications on GPUs are widely-used nowa-days, and are often
implemented to be microservices. Prior re-search works are not applicable to ensuring QoS …

Miriam: Exploiting elastic kernels for real-time multi-dnn inference on edge gpu

Z Zhao, N Ling, N Guan, G Xing - … of the 21st ACM Conference on …, 2023 - dl.acm.org
Many applications such as autonomous driving and augmented reality, require the
concurrent running of multiple deep neural networks (DNN) that poses different levels of real …

Kalmia: A heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers

Z Fu, J Ren, D Zhang, Y Zhou… - IEEE INFOCOM 2022 …, 2022 - ieeexplore.ieee.org
Motivated by the popularity of edge intelligence, DNN services have been widely deployed
at the edge, posing significant performance pressure on edge servers. How to improve the …

NURA: A framework for supporting non-uniform resource accesses in GPUs

S Darabi, N Mahani, H Baxishi… - Proceedings of the …, 2022 - dl.acm.org
Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …