Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer

Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

被引用次数：235 相关文章所有 6 个版本

[PDF] arxiv.org

Processing data where it makes sense: Enabling in-memory computation

O Mutlu, S Ghose, J Gómez-Luna… - Microprocessors and …, 2019 - Elsevier

Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …

被引用次数：298 相关文章所有 9 个版本

[PDF] usenix.org

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org

Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

被引用次数：91 相关文章所有 7 个版本

[PDF] usenix.org

{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters

S Rajasekaran, M Ghobadi, A Akella - 21st USENIX Symposium on …, 2024 - usenix.org

We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …

被引用次数：16 相关文章所有 5 个版本

[PDF] github.io

Heimdall: mobile GPU coordination platform for augmented reality applications

J Yi, Y Lee - Proceedings of the 26th Annual International …, 2020 - dl.acm.org

We present Heimdall, a mobile GPU coordination platform for emerging Augmented Reality
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …

被引用次数：76 相关文章所有 2 个版本

[PDF] acm.org

MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org

The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

被引用次数：114 相关文章所有 7 个版本

[PDF] gatech.edu

Batch-aware unified memory management in GPUs for irregular workloads

H Kim, J Sim, P Gera, R Hadidi, H Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org

While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …

被引用次数：81 相关文章所有 3 个版本

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

被引用次数：99 相关文章所有 10 个版本

[PDF] acm.org

Congestion control in machine learning clusters

S Rajasekaran, M Ghobadi, G Kumar… - Proceedings of the 21st …, 2022 - dl.acm.org

This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …

被引用次数：27 相关文章所有 4 个版本

[PDF] acm.org

Every walk'sa hit: making page walks single-access cache hits

CH Park, I Vougioukas, A Sandberg… - Proceedings of the 27th …, 2022 - dl.acm.org

As memory capacity has outstripped TLB coverage, large data applications suffer from
frequent page table walks. We investigate two complementary techniques for addressing …

被引用次数：36 相关文章所有 6 个版本