DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …
Advancements in accelerating deep neural network inference on aiot devices: A survey
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …
rapid surge in growth, largely due to the effective implementation of deep neural network …
RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …
datacenter applications, diverse solutions including Smart NIC-based ones have been …
A survey of resource management for processing-in-memory and near-memory processing architectures
Due to the amount of data involved in emerging deep learning and big data applications,
operations related to data movement have quickly become a bottleneck. Data-centric …
operations related to data movement have quickly become a bottleneck. Data-centric …
Casper: Accelerating stencil computations using near-cache processing
Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …
from large-scale weather prediction to solving partial differential equations. Stencil …
Decoupled vector runahead
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …
executing separately to the main application thread, that exploits massive amounts of …
Dalorex: A data-local program execution and architecture for memory-bound applications
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
Infinity stream: Portable and programmer-friendly in-/near-memory fusion
In-memory computing with large last-level caches is promising to dramatically alleviate data
movement bottlenecks and expose massive bitline-level parallelization opportunities …
movement bottlenecks and expose massive bitline-level parallelization opportunities …
Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement
BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …
Software's inability to observe and respond to data movement is the root cause of many …