LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators

P Sundararajan - 2010 - Citeseer

Until the early 2000s, general purpose single-core CPU-based systems were the processing
systems of choice for HPC applications. They replaced exotic supercomputing architectures …

被引用次数：95 相关文章所有 2 个版本

A high performance and memory efficient LU decomposer on FPGAs

G Wu, Y Dou, J Sun… - IEEE transactions on …, 2010 - ieeexplore.ieee.org

LU decomposition for dense matrices is an important linear algebra kernel that is widely
used in both scientific and engineering applications. To efficiently perform large matrix LU …

被引用次数：50 相关文章所有 8 个版本

[PDF] arxiv.org

Efficient realization of householder transform through algorithm-architecture co-design for acceleration of QR factorization

F Merchant, T Vatwani, A Chattopadhyay… - … on Parallel and …, 2018 - ieeexplore.ieee.org

QR factorization is a ubiquitous operation in many engineering and scientific applications. In
this paper, we present efficient realization of Householder Transform (HT) based QR …

被引用次数：22 相关文章所有 9 个版本

[PDF] openreview.net

Energy-efficient Gaussian processes using low-precision arithmetic

N Alder, R Herbrich - Forty-first International Conference on Machine …, 2024 - openreview.net

The widespread use of artificial intelligence requires finding energy-efficient paradigms for
the field. We propose to reduce the energy consumption of Gaussian process regression …

Comparison of high level FPGA hardware design for solving tri-diagonal linear systems

DJ Warne, NA Kelson, RF Hayward - Procedia Computer Science, 2014 - Elsevier

Reconfigurable computing devices can increase the performance of compute intensive
algorithms by implementing application specific co-processor architectures. The power cost …

被引用次数：24 相关文章所有 9 个版本

[PDF] utexas.edu

Algorithm, architecture, and floating-point unit codesign of a matrix factorization accelerator

A Pedram, A Gerstlauer… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org

This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …

被引用次数：23 相关文章所有 9 个版本

Micro-architectural enhancements in distributed memory cgras for lu and qr factorizations

F Merchant, A Maity, M Mahadurkar… - … Conference on VLSI …, 2015 - ieeexplore.ieee.org

LU and QR factorizations are the computationally dear part of many applications ranging
from large scale simulations (eg Computational fluid dynamics) to augmented reality. These …

被引用次数：17 相关文章所有 6 个版本

[PDF] researchgate.net

Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA

ZE Rákossy, D Stengele, A Acosta-Aponte… - … Symposium on Applied …, 2015 - Springer

A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra
domain, to exploit architectural features to reach asymptotically optimal efficiency and a low …

被引用次数：12 相关文章所有 2 个版本

Scalable matrix decompositions with multiple cores on FPGAs

YG Tai, CTD Lo, K Psarris - Microprocessors and Microsystems, 2013 - Elsevier

Hardware accelerators are getting increasingly important in heterogeneous systems for
many applications, including those that employ matrix decompositions. In recent years, a …

被引用次数：12 相关文章所有 4 个版本

[PDF] utexas.edu

Floating point architecture extensions for optimized matrix factorization

A Pedram, A Gerstlauer… - 2013 IEEE 21st …, 2013 - ieeexplore.ieee.org

This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …

被引用次数：12 相关文章所有 9 个版本