Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead
Deep learning models are widely being employed for object detection due to their high
performance. However, the majority of applications that require object detection are …
performance. However, the majority of applications that require object detection are …
A comprehensive survey of deep learning-based lightweight object detection models for edge devices
P Mittal - Artificial Intelligence Review, 2024 - Springer
This study concentrates on deep learning-based lightweight object detection models on
edge devices. Designing such lightweight object recognition models is more difficult than …
edge devices. Designing such lightweight object recognition models is more difficult than …
Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …
most common hardware for on-device inference and many inference frameworks have been …
Optimizing depthwise separable convolution operations on gpus
The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
A survey of deep learning on cpus: opportunities and co-optimizations
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …
LIBSHALOM: Optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores
General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing.
While the mainstream linear algebra libraries can deliver high performance on large and …
While the mainstream linear algebra libraries can deliver high performance on large and …
Model parallelism optimization for distributed inference via decoupled CNN structure
It is promising to deploy CNN inference on local end-user devices for high-accuracy and
time-sensitive applications. Model parallelism has the potential to provide high throughput …
time-sensitive applications. Model parallelism has the potential to provide high throughput …
High performance and portable convolution operators for multicore processors
P San Juan, A Castelló, MF Dolz… - 2020 IEEE 32nd …, 2020 - ieeexplore.ieee.org
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution …
tasks has led to the development of various high performance algorithms for the convolution …
Automatic generation of high-performance convolution kernels on ARM CPUs for deep learning
We present FastConv, a template-based code auto-generation open-source library that can
automatically generate high-performance deep learning convolution kernels of arbitrary …
automatically generate high-performance deep learning convolution kernels of arbitrary …
Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs
General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing.
While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good …
While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good …