Programming and synthesis for software-defined FPGA acceleration: status and future prospects
FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …
because they offer massive parallelism, high energy efficiency, and great flexibility for …
Understanding performance differences of FPGAs and GPUs
This paper aims to better understand the performance differences between FPGAs and
GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia …
GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia …
SODA: Stencil with optimized dataflow architecture
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …
as image processing, solving partial differential equations, and cellular automata. Many of …
Automated accelerator generation and optimization with composable, parallel and pipeline architecture
CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads to
advance computational capabilities and energy efficiency in today's datacenters. This …
advance computational capabilities and energy efficiency in today's datacenters. This …
Optimally scheduling CNN convolutions for efficient memory access
Embedded inference engines for convolutional networks must be parsimonious in memory
bandwidth and buffer sizing to meet power and cost constraints. We present an analytical …
bandwidth and buffer sizing to meet power and cost constraints. We present an analytical …
COSMOS: Coordination of high-level synthesis and memory optimization for hardware accelerators
Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC)
architectures. With high-level synthesis (HLS), designers can easily obtain several …
architectures. With high-level synthesis (HLS), designers can easily obtain several …
Manticore: Hardware-accelerated RTL simulation with static bulk-synchronous parallelism
The demise of Moore's Law and Dennard Scaling has revived interest in specialized
computer architectures and accelerators. Verification and testing of this hardware depend …
computer architectures and accelerators. Verification and testing of this hardware depend …
Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability
HM Waidyasooriya, M Hariyama - IEEE Access, 2019 - ieeexplore.ieee.org
After the introduction of the OpenCL-based FPGA accelerator design method, FPGAs are
getting very popular among high-performance computing. The key to achieving high …
getting very popular among high-performance computing. The key to achieving high …
Enhancing the scalability of multi-fpga stencil computations via highly optimized hdl components
Stencil-based algorithms are a relevant class of computational kernels in high-performance
systems, as they appear in a plethora of fields, from image processing to seismic …
systems, as they appear in a plethora of fields, from image processing to seismic …
On how to accelerate iterative stencil loops: a scalable streaming-based approach
R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org
In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …
variety of different fields of application, ranging from partial differential equation solving, to …