DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications

Y Yuan, J Huang, Y Sun, T Wang… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …

A survey of resource management for processing-in-memory and near-memory processing architectures

K Khan, S Pasricha, RG Kim - Journal of Low Power Electronics and …, 2020 - mdpi.com
Due to the amount of data involved in emerging deep learning and big data applications,
operations related to data movement have quickly become a bottleneck. Data-centric …

Casper: Accelerating stencil computations using near-cache processing

A Denzler, GF Oliveira, N Hajinazar, R Bera… - IEEE …, 2023 - ieeexplore.ieee.org
Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

Dalorex: A data-local program execution and architecture for memory-bound applications

M Orenes-Vera, E Tureci, D Wentzlaff… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …

Infinity stream: Portable and programmer-friendly in-/near-memory fusion

Z Wang, C Liu, A Arora, L John… - Proceedings of the 28th …, 2023 - dl.acm.org
In-memory computing with large last-level caches is promising to dramatically alleviate data
movement bottlenecks and expose massive bitline-level parallelization opportunities …

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …