A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Processing data where it makes sense: Enabling in-memory computation

O Mutlu, S Ghose, J Gómez-Luna… - Microprocessors and …, 2019 - Elsevier
Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org
Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters

S Rajasekaran, M Ghobadi, A Akella - 21st USENIX Symposium on …, 2024 - usenix.org
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …

Heimdall: mobile GPU coordination platform for augmented reality applications

J Yi, Y Lee - Proceedings of the 26th Annual International …, 2020 - dl.acm.org
We present Heimdall, a mobile GPU coordination platform for emerging Augmented Reality
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …

MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

Batch-aware unified memory management in GPUs for irregular workloads

H Kim, J Sim, P Gera, R Hadidi, H Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org
While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

Congestion control in machine learning clusters

S Rajasekaran, M Ghobadi, G Kumar… - Proceedings of the 21st …, 2022 - dl.acm.org
This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …

Every walk'sa hit: making page walks single-access cache hits

CH Park, I Vougioukas, A Sandberg… - Proceedings of the 27th …, 2022 - dl.acm.org
As memory capacity has outstripped TLB coverage, large data applications suffer from
frequent page table walks. We investigate two complementary techniques for addressing …