Real-time multiple object visual tracking for embedded GPU systems

M Fernández-Sanjurjo, M Mucientes… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org
Real-time visual object tracking provides every object of interest with a unique identity and a
trajectory across video frames. This is a fundamental task of many video analytics …

SLAPP: Subgraph-level attention-based performance prediction for deep learning models

Z Wang, P Yang, L Hu, B Zhang, C Lin, W Lv, Q Wang - Neural Networks, 2024 - Elsevier
The intricacy of the Deep Learning (DL) landscape, brimming with a variety of models,
applications, and platforms, poses considerable challenges for the optimal design …

AI-driven performance modeling for AI inference workloads

M Sponner, B Waschneck, A Kumar - Electronics, 2022 - mdpi.com
Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters,
but also to the local devices. Although these are mostly limited to inference tasks, it still …

Performance modeling of computer vision-based cnn on edge gpus

H Bouzidi, H Ouarnoughi, S Niar… - ACM Transactions on …, 2022 - dl.acm.org
Convolutional Neural Networks (CNNs) are currently widely used in various fields,
particularly for computer vision applications. Edge platforms have drawn tremendous …

Blackthorn: latency estimation framework for CNNs on embedded Nvidia platforms

M Lechner, A Jantsch - IEEE Access, 2021 - ieeexplore.ieee.org
With more powerful yet efficient embedded devices and accelerators being available for
Deep Neural Networks (DNN), machine learning is becoming an integral part of edge …

Automatic generation of fast and accurate performance models for deep neural network accelerators

K Lübeck, ALF Jung, F Wedlich, MM Müller… - arXiv preprint arXiv …, 2024 - arxiv.org
Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a
challenging task that requires tailored hardware accelerator architectures and a clear …

Flexi-BOPI: Flexible Granularity Pipeline Inference with Bayesian Optimization for Deep Learning Models on HMPSoC

Z Wang, P Yang, B Zhang, L Hu, W Lv, C Lin… - Information Sciences, 2024 - Elsevier
To achieve high-throughput deep learning (DL) model inference on heterogeneous
multiprocessor systems-on-chip (HMPSoC) platforms, the use of pipelining for the …

Performance Prediction for Deep Learning Models with Pipeline Inference Strategy

Z Wang, P Yang, B Zhang, L Hu, W Lv… - IEEE Internet of …, 2023 - ieeexplore.ieee.org
For heterogeneous multiprocessor system-on-chips (HMPSoCs), a reasonable pipeline
design can significantly improve the inference performance of deep learning (DL) models …

Annette: Accurate neural network execution time estimation with stacked models

M Wess, M Ivanov, C Unger, A Nookala, A Wendt… - IEEE …, 2020 - ieeexplore.ieee.org
With new accelerator hardware for Deep Neural Networks (DNNs), the computing power for
Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms …

Accurate estimation of the cnn inference cost for tinyml devices

T Garbay, K Hachicha, P Dobias… - 2022 IEEE 35th …, 2022 - ieeexplore.ieee.org
Our society will be deeply impacted by neural network inference on embedded devices.
Many of them are based on the use of microcontroller units (MCUs) which are extremely …