A survey of methods for analyzing and improving GPU energy efficiency

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

Plasticine: A reconfigurable architecture for parallel paterns

R Prabhakar, Y Zhang, D Koeplinger… - ACM SIGARCH …, 2017 - dl.acm.org
Reconfigurable architectures have gained popularity in recent years as they allow the
design of energy-efficient accelerators. Fine-grain fabrics (eg FPGAs) have traditionally …

Neural acceleration for general-purpose approximate programs

H Esmaeilzadeh, A Sampson, L Ceze… - 2012 45th annual …, 2012 - ieeexplore.ieee.org
This paper describes a learning-based approach to the acceleration of approximate
programs. We describe the Parrot transformation, a program transformation that selects and …

BLIS: A framework for rapidly instantiating BLAS functionality

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org
The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

Accelerating deep convolutional networks using low-precision and sparsity

G Venkatesh, E Nurvitadhi… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
We explore techniques to significantly improve the compute efficiency and performance of
Deep Convolution Networks without impacting their accuracy. To improve the compute …

Analytical modeling is enough for high-performance BLIS

TM Low, FD Igual, TM Smith… - ACM Transactions on …, 2016 - dl.acm.org
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …

Dark memory and accelerator-rich system optimization in the dark silicon era

A Pedram, S Richardson, M Horowitz… - IEEE Design & …, 2016 - ieeexplore.ieee.org
Unlike traditional dark silicon works that attack the computing logic, this article puts a focus
on the memory part, which dissipates most of the energy for memory-bound CPU …

Improved parallel matrix multiplication using Strassen and Urdhvatiryagbhyam method

YRA Bessant, JG Jency, KM Sagayam… - CCF Transactions on …, 2023 - Springer
The current milieu, encourages rapid growth of wireless communication, multimedia
applications, robotics and graphics to have efficient utilization of resources with high …

Performance optimization using partitioned SpMV on GPUs and multicore CPUs

W Yang, K Li, Z Mo, K Li - IEEE Transactions on Computers, 2014 - ieeexplore.ieee.org
This paper presents a sparse matrix partitioning strategy to improve the performance of
SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of …

The BLIS framework: Experiments in portability

FG Van Zee, TM Smith, B Marker, TM Low… - ACM Transactions on …, 2016 - dl.acm.org
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear
algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to …