[PDF][PDF] High performance computing using FPGAs

P Sundararajan - 2010 - Citeseer
Until the early 2000s, general purpose single-core CPU-based systems were the processing
systems of choice for HPC applications. They replaced exotic supercomputing architectures …

A high performance and memory efficient LU decomposer on FPGAs

G Wu, Y Dou, J Sun… - IEEE transactions on …, 2010 - ieeexplore.ieee.org
LU decomposition for dense matrices is an important linear algebra kernel that is widely
used in both scientific and engineering applications. To efficiently perform large matrix LU …

Efficient realization of householder transform through algorithm-architecture co-design for acceleration of QR factorization

F Merchant, T Vatwani, A Chattopadhyay… - … on Parallel and …, 2018 - ieeexplore.ieee.org
QR factorization is a ubiquitous operation in many engineering and scientific applications. In
this paper, we present efficient realization of Householder Transform (HT) based QR …

Energy-efficient Gaussian processes using low-precision arithmetic

N Alder, R Herbrich - Forty-first International Conference on Machine …, 2024 - openreview.net
The widespread use of artificial intelligence requires finding energy-efficient paradigms for
the field. We propose to reduce the energy consumption of Gaussian process regression …

Comparison of high level FPGA hardware design for solving tri-diagonal linear systems

DJ Warne, NA Kelson, RF Hayward - Procedia Computer Science, 2014 - Elsevier
Reconfigurable computing devices can increase the performance of compute intensive
algorithms by implementing application specific co-processor architectures. The power cost …

Algorithm, architecture, and floating-point unit codesign of a matrix factorization accelerator

A Pedram, A Gerstlauer… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …

Micro-architectural enhancements in distributed memory cgras for lu and qr factorizations

F Merchant, A Maity, M Mahadurkar… - … Conference on VLSI …, 2015 - ieeexplore.ieee.org
LU and QR factorizations are the computationally dear part of many applications ranging
from large scale simulations (eg Computational fluid dynamics) to augmented reality. These …

Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA

ZE Rákossy, D Stengele, A Acosta-Aponte… - … Symposium on Applied …, 2015 - Springer
A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra
domain, to exploit architectural features to reach asymptotically optimal efficiency and a low …

Scalable matrix decompositions with multiple cores on FPGAs

YG Tai, CTD Lo, K Psarris - Microprocessors and Microsystems, 2013 - Elsevier
Hardware accelerators are getting increasingly important in heterogeneous systems for
many applications, including those that employ matrix decompositions. In recent years, a …

Floating point architecture extensions for optimized matrix factorization

A Pedram, A Gerstlauer… - 2013 IEEE 21st …, 2013 - ieeexplore.ieee.org
This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …