AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …
notoriously challenging to customize an efficient systolic array processor for a target …
CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …
applications. To cope with the high computation demands of these applications …
Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …
applications including scientific computing, graph processing, and deep learning …
Streaming message interface: High-performance distributed memory programming on reconfigurable hardware
Distributed memory programming is the established paradigm used in high-performance
computing (HPC) systems, requiring explicit communication between nodes and devices …
computing (HPC) systems, requiring explicit communication between nodes and devices …
High performance, low power matrix multiply design on acap: from architecture, design challenges and dse perspectives
As the increasing complexity of Neural Network (NN) models leads to high demands for
computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), ie …
computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), ie …
Noctua2 Supercomputer
Noctua 2 is a supercomputer operated at the Paderborn Center for Parallel Computing
(PC2) at Paderborn University in Germany. Noctua 2 was inaugurated in 2022 and is an …
(PC2) at Paderborn University in Germany. Noctua 2 was inaugurated in 2022 and is an …
The strong scaling advantage of FPGAs in HPC for n-body simulations
N-body methods are one of the essential algorithmic building blocks of high-performance
and parallel computing. Previous research has shown promising performance for …
and parallel computing. Previous research has shown promising performance for …
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments
M Besta, R Gerstenberger, P Iff, P Sonawane… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in
the area of the Semantic Web as well as gaining popularity in other application domains …
the area of the Semantic Web as well as gaining popularity in other application domains …
Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA
T Kenter, A Shambhu, S Faghih-Naini… - Proceedings of the …, 2021 - dl.acm.org
We present the first FPGA implementation of the full simulation pipeline of a shallow water
code based on the discontinuous Galerkin method. Using OpenCL and following an …
code based on the discontinuous Galerkin method. Using OpenCL and following an …
Computing and compressing electron repulsion integrals on FPGAs
The computation of electron repulsion integrals (ERIs) over Gaussian-type orbitals (GTOs) is
a challenging problem in quantum-mechanics-based atomistic simulations. In practical …
a challenging problem in quantum-mechanics-based atomistic simulations. In practical …