A survey on optimized implementation of deep learning models on the nvidia jetson platform

S Mittal - Journal of Systems Architecture, 2019 - Elsevier
Abstract Design of hardware accelerators for neural network (NN) applications involves
walking a tight rope amidst the constraints of low-power, high accuracy and throughput …

Retrospective on VLSI value scaling and lithography

ML Rieger - Journal of Micro/Nanolithography, MEMS, and …, 2019 - spiedigitallibrary.org
In recent decades, the rate of shrinking integrated-circuit components has slowed as
challenges accumulate. Yet, in part by virtue of an accelerating rate of cleverness, the end …

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

Quarantine: Mitigating transient execution attacks with physical domain isolation

M Hertogh, M Wiesinger, S Österlund… - Proceedings of the 26th …, 2023 - dl.acm.org
Since the Spectre and Meltdown disclosure in 2018, the list of new transient execution
vulnerabilities that abuse the shared nature of microarchitectural resources on CPU cores …

Energy-efficient multicore scheduling for hard real-time systems: A survey

SZ Sheikh, MA Pasha - ACM Transactions on Embedded Computing …, 2018 - dl.acm.org
As real-time embedded systems are evolving in scale and complexity, the demand for a
higher performance at a minimum energy consumption has become a necessity …

ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines

X Chen, Y Chen, F Cheng, H Tan, B He… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
The use of FPGAs for efficient graph processing has attracted significant interest. Recent
memory subsystem upgrades including the introduction of HBM in FPGAs promise to further …

Network coding in heterogeneous multicore IoT nodes with DAG scheduling of parallel matrix block operations

S Wunderlich, JA Cabrera, FHP Fitzek… - IEEE Internet of …, 2017 - ieeexplore.ieee.org
Random linear network coding (RLNC) has the potential to improve the performance of
current and future Internet of Things (IoT) communication systems, but is computationally …

Towards energy-efficient heterogeneous multicore architectures for edge computing

A Gamatie, G Devic, G Sassatelli, S Bernabovi… - IEEE …, 2019 - ieeexplore.ieee.org
In recent years, the edge computing paradigm has been attracting much attention in the
Internet-of-Things domain. It aims to push the frontier of computing applications, data, and …

It's time to think about an operating system for near data processing architectures

A Barbalace, A Iliopoulos, H Rauchfuss… - Proceedings of the 16th …, 2017 - dl.acm.org
Near Data Processing, in form of processing in-memory (PIM) and in-storage computing
(ISC) was a very active area of research in computer architectures in the'90s [1, 25]. About 3 …

Collaborative heterogeneity-aware os scheduler for asymmetric multicore processors

T Yu, R Zhong, V Janjic, P Petoumenos… - … on Parallel and …, 2020 - ieeexplore.ieee.org
Asymmetric multicore processors (AMP) offer multiple types of cores under the same
programming interface. Extracting the full potential of AMPs requires intelligent scheduling …