Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads

S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org

As the number of cores on a chip increases and key applications become even more data-
intensive, memory systems in modern processors have to deal with increasingly large …

被引用次数：134 相关文章所有 5 个版本

Approximate computing: A survey

Q Xu, T Mytkowicz, NS Kim - IEEE Design & Test, 2015 - ieeexplore.ieee.org

As one of the most promising energy-efficient computing paradigms, approximate computing
has gained a lot of research attention in the past few years. This paper presents a survey of …

被引用次数：627 相关文章所有 3 个版本

[PDF] umn.edu

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org

Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

被引用次数：69 相关文章所有 10 个版本

[PDF] ubc.ca

GPUWattch: Enabling energy optimizations in GPGPUs

J Leng, T Hetherington, A ElTantawy, S Gilani… - ACM SIGARCH …, 2013 - dl.acm.org

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and
performance per watt has emerged as a more crucial evaluation metric than peak …

被引用次数：765 相关文章所有 21 个版本

[PDF] arxiv.org

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

M Rhu, M O'Connor, N Chatterjee… - … Symposium on High …, 2018 - ieeexplore.ieee.org

Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …

被引用次数：227 相关文章所有 12 个版本

[PDF] cmu.edu

Linearly compressed pages: A low-complexity, low-latency main memory compression framework

G Pekhimenko, V Seshadri, Y Kim, H Xin… - Proceedings of the 46th …, 2013 - dl.acm.org

Data compression is a promising approach for meeting the increasing memory capacity
demands expected in future systems. Unfortunately, existing compression algorithms do not …

被引用次数：191 相关文章所有 21 个版本

[PDF] danielwong.org

Warped-compression: Enabling power efficient GPUs through register compression

S Lee, K Kim, G Koo, H Jeon, WW Ro… - ACM SIGARCH …, 2015 - dl.acm.org

This paper presents Warped-Compression, a warp-level register compression scheme for
reducing GPU power consumption. This work is motivated by the observation that the …

被引用次数：146 相关文章所有 11 个版本

[PDF] thecvf.com

People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting

M Marsden, K McGuinness, S Little… - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper we propose a technique to adapt a convolutional neural network (CNN) based
object counter to additional visual domains and object types while still preserving the …

被引用次数：103 相关文章所有 10 个版本

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

被引用次数：87 相关文章所有 10 个版本

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

被引用次数：129 相关文章所有 6 个版本