Programming and synthesis for software-defined FPGA acceleration: status and future prospects

YH Lai, E Ustun, S Xiang, Z Fang, H Rong… - ACM Transactions on …, 2021 - dl.acm.org
FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …

Understanding performance differences of FPGAs and GPUs

J Cong, Z Fang, M Lo, H Wang, J Xu… - 2018 IEEE 26th Annual …, 2018 - ieeexplore.ieee.org
This paper aims to better understand the performance differences between FPGAs and
GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia …

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

Automated accelerator generation and optimization with composable, parallel and pipeline architecture

J Cong, P Wei, CH Yu, P Zhang - Proceedings of the 55th Annual Design …, 2018 - dl.acm.org
CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads to
advance computational capabilities and energy efficiency in today's datacenters. This …

Optimally scheduling CNN convolutions for efficient memory access

A Stoutchinin, F Conti, L Benini - arXiv preprint arXiv:1902.01492, 2019 - arxiv.org
Embedded inference engines for convolutional networks must be parsimonious in memory
bandwidth and buffer sizing to meet power and cost constraints. We present an analytical …

COSMOS: Coordination of high-level synthesis and memory optimization for hardware accelerators

L Piccolboni, P Mantovani, GD Guglielmo… - ACM Transactions on …, 2017 - dl.acm.org
Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC)
architectures. With high-level synthesis (HLS), designers can easily obtain several …

Manticore: Hardware-accelerated RTL simulation with static bulk-synchronous parallelism

M Emami, S Kashani, K Kamahori… - Proceedings of the 28th …, 2023 - dl.acm.org
The demise of Moore's Law and Dennard Scaling has revived interest in specialized
computer architectures and accelerators. Verification and testing of this hardware depend …

Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability

HM Waidyasooriya, M Hariyama - IEEE Access, 2019 - ieeexplore.ieee.org
After the introduction of the OpenCL-based FPGA accelerator design method, FPGAs are
getting very popular among high-performance computing. The key to achieving high …

Enhancing the scalability of multi-fpga stencil computations via highly optimized hdl components

E Reggiani, E Del Sozzo, D Conficconi… - ACM Transactions on …, 2021 - dl.acm.org
Stencil-based algorithms are a relevant class of computational kernels in high-performance
systems, as they appear in a plethora of fields, from image processing to seismic …

On how to accelerate iterative stencil loops: a scalable streaming-based approach

R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org
In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …