Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead

V Kamath, A Renuka - Neurocomputing, 2023 - Elsevier
Deep learning models are widely being employed for object detection due to their high
performance. However, the majority of applications that require object detection are …

A comprehensive survey of deep learning-based lightweight object detection models for edge devices

P Mittal - Artificial Intelligence Review, 2024 - Springer
This study concentrates on deep learning-based lightweight object detection models on
edge devices. Designing such lightweight object recognition models is more difficult than …

Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus

M Wang, S Ding, T Cao, Y Liu, F Xu - Proceedings of the 27th Annual …, 2021 - dl.acm.org
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …

Optimizing depthwise separable convolution operations on gpus

G Lu, W Zhang, Z Wang - IEEE Transactions on Parallel and …, 2021 - ieeexplore.ieee.org
The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

LIBSHALOM: Optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores

W Yang, J Fang, D Dong, X Su, Z Wang - Proceedings of the …, 2021 - dl.acm.org
General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing.
While the mainstream linear algebra libraries can deliver high performance on large and …

Model parallelism optimization for distributed inference via decoupled CNN structure

J Du, X Zhu, M Shen, Y Du, Y Lu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
It is promising to deploy CNN inference on local end-user devices for high-accuracy and
time-sensitive applications. Model parallelism has the potential to provide high throughput …

High performance and portable convolution operators for multicore processors

P San Juan, A Castelló, MF Dolz… - 2020 IEEE 32nd …, 2020 - ieeexplore.ieee.org
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution …

Automatic generation of high-performance convolution kernels on ARM CPUs for deep learning

J Meng, C Zhuang, P Chen, M Wahib… - … on Parallel and …, 2022 - ieeexplore.ieee.org
We present FastConv, a template-based code auto-generation open-source library that can
automatically generate high-performance deep learning convolution kernels of arbitrary …

Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs

W Yang, J Fang, D Dong, X Su… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing.
While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good …