Real-time multiple object visual tracking for embedded GPU systems
M Fernández-Sanjurjo, M Mucientes… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org
Real-time visual object tracking provides every object of interest with a unique identity and a
trajectory across video frames. This is a fundamental task of many video analytics …
trajectory across video frames. This is a fundamental task of many video analytics …
SLAPP: Subgraph-level attention-based performance prediction for deep learning models
The intricacy of the Deep Learning (DL) landscape, brimming with a variety of models,
applications, and platforms, poses considerable challenges for the optimal design …
applications, and platforms, poses considerable challenges for the optimal design …
AI-driven performance modeling for AI inference workloads
Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters,
but also to the local devices. Although these are mostly limited to inference tasks, it still …
but also to the local devices. Although these are mostly limited to inference tasks, it still …
Performance modeling of computer vision-based cnn on edge gpus
Convolutional Neural Networks (CNNs) are currently widely used in various fields,
particularly for computer vision applications. Edge platforms have drawn tremendous …
particularly for computer vision applications. Edge platforms have drawn tremendous …
Blackthorn: latency estimation framework for CNNs on embedded Nvidia platforms
With more powerful yet efficient embedded devices and accelerators being available for
Deep Neural Networks (DNN), machine learning is becoming an integral part of edge …
Deep Neural Networks (DNN), machine learning is becoming an integral part of edge …
Automatic generation of fast and accurate performance models for deep neural network accelerators
Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a
challenging task that requires tailored hardware accelerator architectures and a clear …
challenging task that requires tailored hardware accelerator architectures and a clear …
Flexi-BOPI: Flexible Granularity Pipeline Inference with Bayesian Optimization for Deep Learning Models on HMPSoC
To achieve high-throughput deep learning (DL) model inference on heterogeneous
multiprocessor systems-on-chip (HMPSoC) platforms, the use of pipelining for the …
multiprocessor systems-on-chip (HMPSoC) platforms, the use of pipelining for the …
Performance Prediction for Deep Learning Models with Pipeline Inference Strategy
For heterogeneous multiprocessor system-on-chips (HMPSoCs), a reasonable pipeline
design can significantly improve the inference performance of deep learning (DL) models …
design can significantly improve the inference performance of deep learning (DL) models …
Annette: Accurate neural network execution time estimation with stacked models
M Wess, M Ivanov, C Unger, A Nookala, A Wendt… - IEEE …, 2020 - ieeexplore.ieee.org
With new accelerator hardware for Deep Neural Networks (DNNs), the computing power for
Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms …
Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms …
Accurate estimation of the cnn inference cost for tinyml devices
T Garbay, K Hachicha, P Dobias… - 2022 IEEE 35th …, 2022 - ieeexplore.ieee.org
Our society will be deeply impacted by neural network inference on embedded devices.
Many of them are based on the use of microcontroller units (MCUs) which are extremely …
Many of them are based on the use of microcontroller units (MCUs) which are extremely …