Codesign tradeoffs for high-performance, low-power linear algebra architectures

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org

Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

被引用次数：291 相关文章所有 13 个版本

[PDF] acm.org

Plasticine: A reconfigurable architecture for parallel paterns

R Prabhakar, Y Zhang, D Koeplinger… - ACM SIGARCH …, 2017 - dl.acm.org

Reconfigurable architectures have gained popularity in recent years as they allow the
design of energy-efficient accelerators. Fine-grain fabrics (eg FPGAs) have traditionally …

被引用次数：340 相关文章所有 17 个版本

[PDF] washington.edu

Neural acceleration for general-purpose approximate programs

H Esmaeilzadeh, A Sampson, L Ceze… - 2012 45th annual …, 2012 - ieeexplore.ieee.org

This paper describes a learning-based approach to the acceleration of approximate
programs. We describe the Parrot transformation, a program transformation that selects and …

被引用次数：884 相关文章所有 30 个版本

[PDF] osti.gov

BLIS: A framework for rapidly instantiating BLAS functionality

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org

The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

被引用次数：449 相关文章所有 7 个版本

[PDF] arxiv.org

Accelerating deep convolutional networks using low-precision and sparsity

G Venkatesh, E Nurvitadhi… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

We explore techniques to significantly improve the compute efficiency and performance of
Deep Convolution Networks without impacting their accuracy. To improve the compute …

被引用次数：173 相关文章所有 8 个版本

[PDF] acm.org

Analytical modeling is enough for high-performance BLIS

TM Low, FD Igual, TM Smith… - ACM Transactions on …, 2016 - dl.acm.org

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …

被引用次数：188 相关文章所有 7 个版本

[PDF] arxiv.org

Dark memory and accelerator-rich system optimization in the dark silicon era

A Pedram, S Richardson, M Horowitz… - IEEE Design & …, 2016 - ieeexplore.ieee.org

Unlike traditional dark silicon works that attack the computing logic, this article puts a focus
on the memory part, which dissipates most of the energy for memory-bound CPU …

被引用次数：168 相关文章所有 4 个版本

Improved parallel matrix multiplication using Strassen and Urdhvatiryagbhyam method

YRA Bessant, JG Jency, KM Sagayam… - CCF Transactions on …, 2023 - Springer

The current milieu, encourages rapid growth of wireless communication, multimedia
applications, robotics and graphics to have efficient utilization of resources with high …

被引用次数：54 相关文章

Performance optimization using partitioned SpMV on GPUs and multicore CPUs

W Yang, K Li, Z Mo, K Li - IEEE Transactions on Computers, 2014 - ieeexplore.ieee.org

This paper presents a sparse matrix partitioning strategy to improve the performance of
SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of …

被引用次数：144 相关文章所有 7 个版本

[PDF] acm.org

The BLIS framework: Experiments in portability

FG Van Zee, TM Smith, B Marker, TM Low… - ACM Transactions on …, 2016 - dl.acm.org

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear
algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to …

被引用次数：133 相关文章所有 9 个版本