FeatherCNN: Fast inference computation with TensorGEMM on ARM architectures

Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead

V Kamath, A Renuka - Neurocomputing, 2023 - Elsevier

Deep learning models are widely being employed for object detection due to their high
performance. However, the majority of applications that require object detection are …

被引用次数：64 相关文章所有 2 个版本

[PDF] springer.com

A comprehensive survey of deep learning-based lightweight object detection models for edge devices

P Mittal - Artificial Intelligence Review, 2024 - Springer

This study concentrates on deep learning-based lightweight object detection models on
edge devices. Designing such lightweight object recognition models is more difficult than …

被引用次数：7 相关文章

Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus

M Wang, S Ding, T Cao, Y Liu, F Xu - Proceedings of the 27th Annual …, 2021 - dl.acm.org

On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …

被引用次数：60 相关文章

[PDF] whiterose.ac.uk

Optimizing depthwise separable convolution operations on gpus

G Lu, W Zhang, Z Wang - IEEE Transactions on Parallel and …, 2021 - ieeexplore.ieee.org

The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …

被引用次数：58 相关文章所有 4 个版本

[PDF] researchgate.net

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

被引用次数：70 相关文章所有 6 个版本

[PDF] whiterose.ac.uk

LIBSHALOM: Optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores

W Yang, J Fang, D Dong, X Su, Z Wang - Proceedings of the …, 2021 - dl.acm.org

General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing.
While the mainstream linear algebra libraries can deliver high performance on large and …

被引用次数：30 相关文章所有 6 个版本

[PDF] google.com

Model parallelism optimization for distributed inference via decoupled CNN structure

J Du, X Zhu, M Shen, Y Du, Y Lu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

It is promising to deploy CNN inference on local end-user devices for high-accuracy and
time-sensitive applications. Model parallelism has the potential to provide high throughput …

被引用次数：31 相关文章所有 3 个版本

High performance and portable convolution operators for multicore processors

P San Juan, A Castelló, MF Dolz… - 2020 IEEE 32nd …, 2020 - ieeexplore.ieee.org

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution …

被引用次数：26 相关文章所有 3 个版本

[PDF] ieee.org

Automatic generation of high-performance convolution kernels on ARM CPUs for deep learning

J Meng, C Zhuang, P Chen, M Wahib… - … on Parallel and …, 2022 - ieeexplore.ieee.org

We present FastConv, a template-based code auto-generation open-source library that can
automatically generate high-performance deep learning convolution kernels of arbitrary …

被引用次数：15 相关文章所有 6 个版本

Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs

W Yang, J Fang, D Dong, X Su… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing.
While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good …

被引用次数：4 相关文章所有 4 个版本