Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale
data parallel workloads, but are considered weak in processing serialized tasks and …
data parallel workloads, but are considered weak in processing serialized tasks and …
Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices
P Pandit, R Govindarajan - … IEEE/ACM International Symposium on Code …, 2014 - dl.acm.org
Programming heterogeneous computing systems with Graphics Processing Units (GPU) and
multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an …
multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an …
Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture
Incorporating a GPU architecture into CMP, which is more efficient with certain types of
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
Die-stacked memory device providing data translation
GH Loh, BM Beckmann, JM O'connor… - US Patent …, 2015 - Google Patents
(57) ABSTRACT A die-stacked memory device incorporates a data translation controller at
one or more logic dies of the device to provide data translation services for data to be stored …
one or more logic dies of the device to provide data translation services for data to be stored …
A survey of techniques for managing and leveraging caches in GPUs
S Mittal - Journal of Circuits, Systems, and Computers, 2014 - World Scientific
Initially introduced as special-purpose accelerators for graphics applications, graphics
processing units (GPUs) have now emerged as general purpose computing platforms for a …
processing units (GPUs) have now emerged as general purpose computing platforms for a …
Stacked memory device with metadata management
GH Loh, JM O'connor, BM Beckmann… - US Patent …, 2017 - Google Patents
(65) Prior Publication Data Primary Examiner—Sam Rizk US 2014/004O698 A1 Feb. 6,
2014(57) ABSTRACT (51) Int. Cl A processing system comprises one or more processor ioM …
2014(57) ABSTRACT (51) Int. Cl A processing system comprises one or more processor ioM …
Die-stacked memory device with reconfigurable logic
NS Jayasena, MJ Schulte, GH Loh… - US Patent …, 2014 - Google Patents
A die-stacked memory device incorporates a reconfigurable logic device to provide
implementation flexibility in performing various data manipulation operations and other …
implementation flexibility in performing various data manipulation operations and other …
Warped-preexecution: A GPU pre-execution approach for improving latency hiding
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …
Spare register aware prefetching for graph algorithms on GPUs
NB Lakshminarayana, H Kim - 2014 IEEE 20th international …, 2014 - ieeexplore.ieee.org
More and more graph algorithms are being GPU enabled. Graph algorithm implementations
on GPUs have irregular control flow and are memory-intensive with many irregular/data …
on GPUs have irregular control flow and are memory-intensive with many irregular/data …