Neither more nor less: Optimizing thread-level parallelism for GPGPUs

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org

Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

被引用次数：188 相关文章所有 10 个版本

[PDF] acm.org

Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org

Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

被引用次数：230 相关文章所有 15 个版本

[PDF] computermachines.org

GPGPU performance and power estimation using machine learning

G Wu, JL Greathouse, A Lyashevsky… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) have numerous configuration and design options,
including core frequency, number of parallel compute units (CUs), and available memory …

被引用次数：260 相关文章所有 9 个版本

[PDF] cmu.edu

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

[PDF] semanticscholar.org

Improving GPGPU resource utilization through alternative thread block scheduling

M Lee, S Song, J Moon, J Kim, W Seo… - 2014 IEEE 20th …, 2014 - ieeexplore.ieee.org

High performance in GPGPU workloads is obtained by maximizing parallelism and fully
utilizing the available resources. The thousands of threads are assigned to each core in …

被引用次数：219 相关文章所有 8 个版本

[PDF] psu.edu

Orchestrated scheduling and prefetching for GPGPUs

A Jog, O Kayiran, AK Mishra, MT Kandemir… - Proceedings of the 40th …, 2013 - dl.acm.org

In this paper, we present techniques that coordinate the thread scheduling and prefetching
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …

被引用次数：249 相关文章所有 19 个版本

[PDF] iastate.edu

Divergence-aware warp scheduling

TG Rogers, M O'Connor, TM Aamodt - … of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org

This paper uses hardware thread scheduling to improve the performance and energy
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …

被引用次数：195 相关文章所有 11 个版本

[PDF] psu.edu

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

被引用次数：166 相关文章所有 10 个版本

[PDF] cmu.edu

Managing GPU concurrency in heterogeneous architectures

O Kayiran, NC Nachiappan, A Jog… - 2014 47th annual …, 2014 - ieeexplore.ieee.org

Heterogeneous architectures consisting of general-purpose CPUs and throughput-
optimized GPUs are projected to be the dominant computing platforms for many classes of …

被引用次数：174 相关文章所有 29 个版本

[PDF] archive.org

Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming

Q Xu, H Jeon, K Kim, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org

As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of
computing resources to support thread-level parallelism. But even with the best effort …

被引用次数：138 相关文章所有 9 个版本