A survey of performance tuning techniques and tools for parallel applications

D Mustafa - IEEE Access, 2022 - ieeexplore.ieee.org
Automatic parallelization of sequential programs combined with auto-tuning is an alternative
to manual parallelization. With wider research directions and the increased number of …

The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview

AA Khan, JPC De Lima, H Farzaneh… - arXiv preprint arXiv …, 2024 - arxiv.org
In today's data-centric world, where data fuels numerous application domains, with machine
learning at the forefront, handling the enormous volume of data efficiently in terms of time …

Bring memristive in-memory computing into general-purpose machine learning: A perspective

H Zhou, J Chen, J Li, L Yang, Y Li, X Miao - APL Machine Learning, 2023 - pubs.aip.org
In-memory computing (IMC) using emerging nonvolatile devices has received considerable
attention due to its great potential for accelerating artificial neural networks and machine …

C4CAM: A Compiler for CAM-based In-memory Accelerators

H Farzaneh, JPC De Lima, M Li, AA Khan… - Proceedings of the 29th …, 2024 - dl.acm.org
Machine learning and data analytics applications increasingly suffer from the high latency
and energy consumption of conventional von Neumann architectures. Recently, several in …

Special Session-Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications

J Henkel, L Siddhu, L Bauer, J Teich… - Proceedings of the …, 2023 - dl.acm.org
This paper explores the challenges and opportunities of integrating non-volatile memories
(NVMs) into embedded systems for machine learning. NVMs offer advantages such as …

Cim-mlc: A multi-level compilation stack for computing-in-memory accelerators

S Qu, S Zhao, B Li, Y He, X Cai, L Zhang… - Proceedings of the 29th …, 2024 - dl.acm.org
In recent years, various computing-in-memory (CIM) processors have been presented,
showing superior performance over traditional architectures. To unleash the potential of …

SongC: A compiler for hybrid near-memory and in-memory many-core architecture

J Lin, H Qu, S Ma, X Ji, H Li, X Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Building hybrid systems that incorporate various processing-in-memory (PIM) devices and
processing-near-memory (PNM) technologies can offer complementary advantages in both …

XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory

J Park, H Sung - IEEE Computer Architecture Letters, 2023 - ieeexplore.ieee.org
Deep learning (DL) model training must address the memory bottleneck to continue scaling.
Processing-in-memory approaches can be a viable solution as they move computations …

Cinm (cinnamon): A compilation infrastructure for heterogeneous compute in-memory and compute near-memory paradigms

AA Khan, H Farzaneh, KFA Friebel, C Fournier… - arXiv preprint arXiv …, 2022 - arxiv.org
The rise of data-intensive applications exposed the limitations of conventional processor-
centric von-Neumann architectures that struggle to meet the off-chip memory bandwidth …

Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, & Compilers

M Niemier, Z Enciso, M Sharifi, XS Hu… - … , Automation & Test …, 2024 - ieeexplore.ieee.org
Multiple research vectors represent possible paths to improved energy and performance
metrics at the application-level. There are active efforts with respect to emerging logic …