Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …
bandwidth. However, under stringent physical constraints of chip packages and system …
Evaluating machine learningworkloads on memory-centric computing systems
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
GradPIM: A practical processing-in-DRAM architecture for gradient descent
In this paper, we present GradPIM, a processingin-memory architecture which accelerates
parameter updates of deep neural networks training. As one of processing-in-memory …
parameter updates of deep neural networks training. As one of processing-in-memory …
Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation
A Yazdanbakhsh, A Moradifirouzabadi… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
As its core computation, a self-attention mechanism gauges pairwise correlations across the
entire input sequence. Despite favorable performance, calculating pairwise correlations is …
entire input sequence. Despite favorable performance, calculating pairwise correlations is …
Menda: A near-memory multi-way merge solution for sparse transposition and dataflows
Near-memory processing has been extensively studied to optimize memory intensive
workloads. However, none of the proposed designs address sparse matrix transposition, an …
workloads. However, none of the proposed designs address sparse matrix transposition, an …
Simplepim: A software framework for productive and efficient processing-in-memory
Data movement between memory and processors is a major bottleneck in modern
computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this …
computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this …
Accelerating bandwidth-bound deep learning inference with main-memory accelerators
Matrix-matrix multiplication operations (GEMMs) are important in many HPC and machine-
learning applications. They are often mapped to discrete accelerators (eg, GPUs) to improve …
learning applications. They are often mapped to discrete accelerators (eg, GPUs) to improve …
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview
In today's data-centric world, where data fuels numerous application domains, with machine
learning at the forefront, handling the enormous volume of data efficiently in terms of time …
learning at the forefront, handling the enormous volume of data efficiently in terms of time …
PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System
S Rhyner, H Luo, J Gómez-Luna… - Proceedings of the …, 2024 - dl.acm.org
Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming
workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to …
workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to …