Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …
Smartsage: training large-scale graph neural networks using in-storage processing architectures
Graph neural networks (GNNs) can extract features by learning both the representation of
each objects (ie, graph nodes) and the relationship across different objects (ie, the edges …
each objects (ie, graph nodes) and the relationship across different objects (ie, the edges …
Evaluating machine learningworkloads on memory-centric computing systems
Training machine learning (ML) algorithms is a computationally intensive process, which is
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
frequently memory-bound due to repeatedly accessing large training datasets. As a result …
RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …
datacenter applications, diverse solutions including Smart NIC-based ones have been …
Dimm-link: Enabling efficient inter-dimm communication for near-memory processing
DIMM-based near-memory processing architectures (DIMM-NMP) have received growing
interest from both academia and industry. They have the advantages of large memory …
interest from both academia and industry. They have the advantages of large memory …
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
Processing-in-memory (PIM) has been explored for decades by computer architects, yet it
has never seen the light of day in real-world products due to its high design overheads and …
has never seen the light of day in real-world products due to its high design overheads and …
Training personalized recommendation systems from (GPU) scratch: Look forward not backwards
Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …
Grow: A row-stationary sparse-dense gemm accelerator for memory-efficient graph convolutional neural networks
Graph convolutional neural networks (GCNs) have emerged as a key technology in various
application domains where the input data is relational. A unique property of GCNs is that its …
application domains where the input data is relational. A unique property of GCNs is that its …
Accelerating weather prediction using near-memory reconfigurable fabric
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
Mp-rec: Hardware-software co-design to enable multi-path recommendation
Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …