[PDF][PDF] High performance computing using FPGAs
P Sundararajan - 2010 - Citeseer
Until the early 2000s, general purpose single-core CPU-based systems were the processing
systems of choice for HPC applications. They replaced exotic supercomputing architectures …
systems of choice for HPC applications. They replaced exotic supercomputing architectures …
A high performance and memory efficient LU decomposer on FPGAs
G Wu, Y Dou, J Sun… - IEEE transactions on …, 2010 - ieeexplore.ieee.org
LU decomposition for dense matrices is an important linear algebra kernel that is widely
used in both scientific and engineering applications. To efficiently perform large matrix LU …
used in both scientific and engineering applications. To efficiently perform large matrix LU …
Efficient realization of householder transform through algorithm-architecture co-design for acceleration of QR factorization
QR factorization is a ubiquitous operation in many engineering and scientific applications. In
this paper, we present efficient realization of Householder Transform (HT) based QR …
this paper, we present efficient realization of Householder Transform (HT) based QR …
Energy-efficient Gaussian processes using low-precision arithmetic
N Alder, R Herbrich - Forty-first International Conference on Machine …, 2024 - openreview.net
The widespread use of artificial intelligence requires finding energy-efficient paradigms for
the field. We propose to reduce the energy consumption of Gaussian process regression …
the field. We propose to reduce the energy consumption of Gaussian process regression …
Comparison of high level FPGA hardware design for solving tri-diagonal linear systems
Reconfigurable computing devices can increase the performance of compute intensive
algorithms by implementing application specific co-processor architectures. The power cost …
algorithms by implementing application specific co-processor architectures. The power cost …
Algorithm, architecture, and floating-point unit codesign of a matrix factorization accelerator
A Pedram, A Gerstlauer… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …
systems and linear least-squares problems to a custom Linear Algebra Processor …
Micro-architectural enhancements in distributed memory cgras for lu and qr factorizations
F Merchant, A Maity, M Mahadurkar… - … Conference on VLSI …, 2015 - ieeexplore.ieee.org
LU and QR factorizations are the computationally dear part of many applications ranging
from large scale simulations (eg Computational fluid dynamics) to augmented reality. These …
from large scale simulations (eg Computational fluid dynamics) to augmented reality. These …
Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA
ZE Rákossy, D Stengele, A Acosta-Aponte… - … Symposium on Applied …, 2015 - Springer
A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra
domain, to exploit architectural features to reach asymptotically optimal efficiency and a low …
domain, to exploit architectural features to reach asymptotically optimal efficiency and a low …
Scalable matrix decompositions with multiple cores on FPGAs
YG Tai, CTD Lo, K Psarris - Microprocessors and Microsystems, 2013 - Elsevier
Hardware accelerators are getting increasingly important in heterogeneous systems for
many applications, including those that employ matrix decompositions. In recent years, a …
many applications, including those that employ matrix decompositions. In recent years, a …
Floating point architecture extensions for optimized matrix factorization
A Pedram, A Gerstlauer… - 2013 IEEE 21st …, 2013 - ieeexplore.ieee.org
This paper examines the mapping of algorithms encountered when solving dense linear
systems and linear least-squares problems to a custom Linear Algebra Processor …
systems and linear least-squares problems to a custom Linear Algebra Processor …