Livia: Data-centric computing throughout the memory hierarchy

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

被引用次数：98 相关文章所有 10 个版本

[PDF] arxiv.org

Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org

Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

被引用次数：43 相关文章所有 10 个版本

Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

被引用次数：14 相关文章所有 4 个版本

[PDF] washington.edu

RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications

Y Yuan, J Huang, Y Sun, T Wang… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …

被引用次数：23 相关文章所有 6 个版本

[PDF] mdpi.com

A survey of resource management for processing-in-memory and near-memory processing architectures

K Khan, S Pasricha, RG Kim - Journal of Low Power Electronics and …, 2020 - mdpi.com

Due to the amount of data involved in emerging deep learning and big data applications,
operations related to data movement have quickly become a bottleneck. Data-centric …

被引用次数：23 相关文章所有 9 个版本

[PDF] ieee.org

Casper: Accelerating stencil computations using near-cache processing

A Denzler, GF Oliveira, N Hajinazar, R Bera… - IEEE …, 2023 - ieeexplore.ieee.org

Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …

被引用次数：37 相关文章所有 5 个版本

[PDF] cam.ac.uk

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

被引用次数：8 相关文章所有 9 个版本

[PDF] arxiv.org

Dalorex: A data-local program execution and architecture for memory-bound applications

M Orenes-Vera, E Tureci, D Wentzlaff… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …

被引用次数：22 相关文章所有 5 个版本

[PDF] acm.org

Infinity stream: Portable and programmer-friendly in-/near-memory fusion

Z Wang, C Liu, A Arora, L John… - Proceedings of the 28th …, 2023 - dl.acm.org

In-memory computing with large last-level caches is promising to dramatically alleviate data
movement bottlenecks and expose massive bitline-level parallelization opportunities …

被引用次数：11 相关文章所有 6 个版本

[PDF] acm.org

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org

Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …

被引用次数：18 相关文章所有 8 个版本