A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Processing data where it makes sense: Enabling in-memory computation
Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …
choice goes directly against at least three key trends in systems that cause performance …
{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees
Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …
{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
Heimdall: mobile GPU coordination platform for augmented reality applications
We present Heimdall, a mobile GPU coordination platform for emerging Augmented Reality
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …
MGPUSim: Enabling multi-GPU performance modeling and optimization
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
Batch-aware unified memory management in GPUs for irregular workloads
While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …
abstractions to programmers for working with large-scale applications, they come at a …
A framework for memory oversubscription management in graphics processing units
C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …
management of data movement between CPU memory and GPU memory dramatically …
Congestion control in machine learning clusters
This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …
Every walk'sa hit: making page walks single-access cache hits
As memory capacity has outstripped TLB coverage, large data applications suffer from
frequent page table walks. We investigate two complementary techniques for addressing …
frequent page table walks. We investigate two complementary techniques for addressing …