Divergence-aware warp scheduling

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org

Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

被引用次数：291 相关文章所有 13 个版本

[PDF] tu-darmstadt.de

Cloud computing landscape and research challenges regarding trust and reputation

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org

Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

被引用次数：193 相关文章所有 10 个版本

[PDF] arxiv.org

Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

被引用次数：208 相关文章所有 12 个版本

[PDF] acm.org

Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org

Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

被引用次数：245 相关文章所有 15 个版本

[PDF] acm.org

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers

Q Chen, H Yang, M Guo, RS Kannan, J Mars… - Proceedings of the …, 2017 - dl.acm.org

Guaranteeing Quality-of-Service (QoS) of latency-sensitive applications while improving
server utilization through application co-location is important yet challenging in modern …

被引用次数：180 相关文章所有 5 个版本

[PDF] github.io

Adaptive cache management for energy-efficient GPU computing

X Chen, LW Chang, CI Rodrigues, J Lv… - 2014 47th Annual …, 2014 - ieeexplore.ieee.org

With the SIMT execution model, GPUs can hide memory latency through massive
multithreading for many applications that have regular memory access patterns. To support …

被引用次数：210 相关文章所有 16 个版本

[PDF] psu.edu

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

被引用次数：167 相关文章所有 10 个版本

[PDF] utexas.edu

Flexible software profiling of gpu architectures

M Stephenson, SK Sastry Hari, Y Lee… - Proceedings of the …, 2015 - dl.acm.org

To aid application characterization and architecture design space exploration, researchers
and engineers have developed a wide range of tools for CPUs, including simulators …

被引用次数：135 相关文章所有 8 个版本

[PDF] utexas.edu

Anatomy of gpu memory system for multi-application execution

A Jog, O Kayiran, T Kesten, A Pattnaik… - Proceedings of the …, 2015 - dl.acm.org

As GPUs make headway in the computing landscape spanning mobile platforms,
supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of …

被引用次数：118 相关文章所有 12 个版本

[PDF] utexas.edu

Priority-based cache allocation in throughput processors

D Li, M Rhu, DR Johnson, M O'Connor… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

GPUs employ massive multithreading and fast context switching to provide high throughput
and hide memory latency. Multithreading can Increase contention for various system …

被引用次数：118 相关文章所有 11 个版本