Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Y Sun, X Gong, AK Ziabari, L Yu, X Li… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale
data parallel workloads, but are considered weak in processing serialized tasks and …

Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices

P Pandit, R Govindarajan - … IEEE/ACM International Symposium on Code …, 2014 - dl.acm.org
Programming heterogeneous computing systems with Graphics Processing Units (GPU) and
multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an …

Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture

J Lee, S Li, H Kim, S Yalamanchili - Journal of Parallel and Distributed …, 2013 - Elsevier
Incorporating a GPU architecture into CMP, which is more efficient with certain types of
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …

Die-stacked memory device providing data translation

GH Loh, BM Beckmann, JM O'connor… - US Patent …, 2015 - Google Patents
(57) ABSTRACT A die-stacked memory device incorporates a data translation controller at
one or more logic dies of the device to provide data translation services for data to be stored …

A survey of techniques for managing and leveraging caches in GPUs

S Mittal - Journal of Circuits, Systems, and Computers, 2014 - World Scientific
Initially introduced as special-purpose accelerators for graphics applications, graphics
processing units (GPUs) have now emerged as general purpose computing platforms for a …

In-cache query co-processing on coupled CPU-GPU architectures

J He, S Zhang, B He - 2014 - dr.ntu.edu.sg
Recently, there have been some emerging processor designs that the CPU and the GPU
(Graphics Processing Unit) are integrated in a single chip and share Last Level Cache …

Stacked memory device with metadata management

GH Loh, JM O'connor, BM Beckmann… - US Patent …, 2017 - Google Patents
(65) Prior Publication Data Primary Examiner—Sam Rizk US 2014/004O698 A1 Feb. 6,
2014(57) ABSTRACT (51) Int. Cl A processing system comprises one or more processor ioM …

Die-stacked memory device with reconfigurable logic

NS Jayasena, MJ Schulte, GH Loh… - US Patent …, 2014 - Google Patents
A die-stacked memory device incorporates a reconfigurable logic device to provide
implementation flexibility in performing various data manipulation operations and other …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

Spare register aware prefetching for graph algorithms on GPUs

NB Lakshminarayana, H Kim - 2014 IEEE 20th international …, 2014 - ieeexplore.ieee.org
More and more graph algorithms are being GPU enabled. Graph algorithm implementations
on GPUs have irregular control flow and are memory-intensive with many irregular/data …