A survey of methods for analyzing and improving GPU energy efficiency
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …
applications of GPUs. However, this trend has also led to a dramatic increase in their power …
Plasticine: A reconfigurable architecture for parallel paterns
Reconfigurable architectures have gained popularity in recent years as they allow the
design of energy-efficient accelerators. Fine-grain fabrics (eg FPGAs) have traditionally …
design of energy-efficient accelerators. Fine-grain fabrics (eg FPGAs) have traditionally …
Neural acceleration for general-purpose approximate programs
This paper describes a learning-based approach to the acceleration of approximate
programs. We describe the Parrot transformation, a program transformation that selects and …
programs. We describe the Parrot transformation, a program transformation that selects and …
BLIS: A framework for rapidly instantiating BLAS functionality
FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org
The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …
Accelerating deep convolutional networks using low-precision and sparsity
G Venkatesh, E Nurvitadhi… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
We explore techniques to significantly improve the compute efficiency and performance of
Deep Convolution Networks without impacting their accuracy. To improve the compute …
Deep Convolution Networks without impacting their accuracy. To improve the compute …
Analytical modeling is enough for high-performance BLIS
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …
Dark memory and accelerator-rich system optimization in the dark silicon era
Unlike traditional dark silicon works that attack the computing logic, this article puts a focus
on the memory part, which dissipates most of the energy for memory-bound CPU …
on the memory part, which dissipates most of the energy for memory-bound CPU …
Improved parallel matrix multiplication using Strassen and Urdhvatiryagbhyam method
The current milieu, encourages rapid growth of wireless communication, multimedia
applications, robotics and graphics to have efficient utilization of resources with high …
applications, robotics and graphics to have efficient utilization of resources with high …
Performance optimization using partitioned SpMV on GPUs and multicore CPUs
This paper presents a sparse matrix partitioning strategy to improve the performance of
SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of …
SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of …
The BLIS framework: Experiments in portability
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear
algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to …
algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to …