AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture

J Zhuang, J Lau, H Ye, Z Yang, Y Du, J Lo… - Proceedings of the …, 2023 - dl.acm.org
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

Streaming message interface: High-performance distributed memory programming on reconfigurable hardware

T De Matteis, J de Fine Licht, J Beránek… - Proceedings of the …, 2019 - dl.acm.org
Distributed memory programming is the established paradigm used in high-performance
computing (HPC) systems, requiring explicit communication between nodes and devices …

High performance, low power matrix multiply design on acap: from architecture, design challenges and dse perspectives

J Zhuang, Z Yang, P Zhou - 2023 60th ACM/IEEE Design …, 2023 - ieeexplore.ieee.org
As the increasing complexity of Neural Network (NN) models leads to high demands for
computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), ie …

Noctua2 Supercomputer

C Bauer, T Kenter, M Lass… - … of large-scale …, 2024 - … -of-large-scale-research-facilities.org
Noctua 2 is a supercomputer operated at the Paderborn Center for Parallel Computing
(PC2) at Paderborn University in Germany. Noctua 2 was inaugurated in 2022 and is an …

The strong scaling advantage of FPGAs in HPC for n-body simulations

J Menzel, C Plessl, T Kenter - ACM Transactions on Reconfigurable …, 2021 - dl.acm.org
N-body methods are one of the essential algorithmic building blocks of high-performance
and parallel computing. Previous research has shown promising performance for …

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments

M Besta, R Gerstenberger, P Iff, P Sonawane… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in
the area of the Semantic Web as well as gaining popularity in other application domains …

Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA

T Kenter, A Shambhu, S Faghih-Naini… - Proceedings of the …, 2021 - dl.acm.org
We present the first FPGA implementation of the full simulation pipeline of a shallow water
code based on the discontinuous Galerkin method. Using OpenCL and following an …

Computing and compressing electron repulsion integrals on FPGAs

X Wu, T Kenter, R Schade, TD Kühne… - 2023 IEEE 31st …, 2023 - ieeexplore.ieee.org
The computation of electron repulsion integrals (ERIs) over Gaussian-type orbitals (GTOs) is
a challenging problem in quantum-mechanics-based atomistic simulations. In practical …