Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product

S Lee, S Kang, J Lee, H Kim, E Lee… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …

Evaluating machine learningworkloads on memory-centric computing systems

J Gómez-Luna, Y Guo, S Brocard… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …

GradPIM: A practical processing-in-DRAM architecture for gradient descent

H Kim, H Park, T Kim, K Cho, E Lee… - … Symposium on High …, 2021 - ieeexplore.ieee.org
In this paper, we present GradPIM, a processingin-memory architecture which accelerates
parameter updates of deep neural networks training. As one of processing-in-memory …

Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation

A Yazdanbakhsh, A Moradifirouzabadi… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
As its core computation, a self-attention mechanism gauges pairwise correlations across the
entire input sequence. Despite favorable performance, calculating pairwise correlations is …

Menda: A near-memory multi-way merge solution for sparse transposition and dataflows

S Feng, X He, KY Chen, L Ke, X Zhang… - Proceedings of the 49th …, 2022 - dl.acm.org
Near-memory processing has been extensively studied to optimize memory intensive
workloads. However, none of the proposed designs address sparse matrix transposition, an …

Simplepim: A software framework for productive and efficient processing-in-memory

J Chen, J Gómez-Luna, I El Hajj… - 2023 32nd …, 2023 - ieeexplore.ieee.org
Data movement between memory and processors is a major bottleneck in modern
computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this …

Accelerating bandwidth-bound deep learning inference with main-memory accelerators

BY Cho, J Jung, M Erez - … of the International Conference for High …, 2021 - dl.acm.org
Matrix-matrix multiplication operations (GEMMs) are important in many HPC and machine-
learning applications. They are often mapped to discrete accelerators (eg, GPUs) to improve …

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

J Gómez-Luna, Y Guo, S Brocard, J Legriel… - arXiv preprint arXiv …, 2022 - arxiv.org
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …

The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview

AA Khan, JPC De Lima, H Farzaneh… - arXiv preprint arXiv …, 2024 - arxiv.org
In today's data-centric world, where data fuels numerous application domains, with machine
learning at the forefront, handling the enormous volume of data efficiently in terms of time …

PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System

S Rhyner, H Luo, J Gómez-Luna… - Proceedings of the …, 2024 - dl.acm.org
Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming
workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to …