Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Smartsage: training large-scale graph neural networks using in-storage processing architectures

Y Lee, J Chung, M Rhu - Proceedings of the 49th Annual International …, 2022 - dl.acm.org
Graph neural networks (GNNs) can extract features by learning both the representation of
each objects (ie, graph nodes) and the relationship across different objects (ie, the edges …

Evaluating machine learningworkloads on memory-centric computing systems

J Gómez-Luna, Y Guo, S Brocard… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …

RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications

Y Yuan, J Huang, Y Sun, T Wang… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …

Dimm-link: Enabling efficient inter-dimm communication for near-memory processing

Z Zhou, C Li, F Yang, G Sun - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
DIMM-based near-memory processing architectures (DIMM-NMP) have received growing
interest from both academia and industry. They have the advantages of large memory …

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology

B Hyun, T Kim, D Lee, M Rhu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Processing-in-memory (PIM) has been explored for decades by computer architects, yet it
has never seen the light of day in real-world products due to its high design overheads and …

Training personalized recommendation systems from (GPU) scratch: Look forward not backwards

Y Kwon, M Rhu - Proceedings of the 49th Annual International …, 2022 - dl.acm.org
Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …

Grow: A row-stationary sparse-dense gemm accelerator for memory-efficient graph convolutional neural networks

R Hwang, M Kang, J Lee, D Kam… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Graph convolutional neural networks (GCNs) have emerged as a key technology in various
application domains where the input data is relational. A unique property of GCNs is that its …

Accelerating weather prediction using near-memory reconfigurable fabric

G Singh, D Diamantopoulos, J Gómez-Luna… - ACM Transactions on …, 2022 - dl.acm.org
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …

Mp-rec: Hardware-software co-design to enable multi-path recommendation

S Hsia, U Gupta, B Acun, N Ardalani, P Zhong… - Proceedings of the 28th …, 2023 - dl.acm.org
Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …