An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioni...

YH Lai, E Ustun, S Xiang, Z Fang, H Rong… - ACM Transactions on …, 2021 - dl.acm.org

FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …

被引用次数：42 相关文章所有 3 个版本

[PDF] sfu.ca

Understanding performance differences of FPGAs and GPUs

J Cong, Z Fang, M Lo, H Wang, J Xu… - 2018 IEEE 26th Annual …, 2018 - ieeexplore.ieee.org

This paper aims to better understand the performance differences between FPGAs and
GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia …

被引用次数：157 相关文章所有 6 个版本

[PDF] nsf.gov

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org

Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

被引用次数：130 相关文章所有 9 个版本

[PDF] acm.org

Automated accelerator generation and optimization with composable, parallel and pipeline architecture

J Cong, P Wei, CH Yu, P Zhang - Proceedings of the 55th Annual Design …, 2018 - dl.acm.org

CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads to
advance computational capabilities and energy efficiency in today's datacenters. This …

被引用次数：83 相关文章所有 7 个版本

[PDF] arxiv.org

Optimally scheduling CNN convolutions for efficient memory access

A Stoutchinin, F Conti, L Benini - arXiv preprint arXiv:1902.01492, 2019 - arxiv.org

Embedded inference engines for convolutional networks must be parsimonious in memory
bandwidth and buffer sizing to meet power and cost constraints. We present an analytical …

被引用次数：58 相关文章所有 4 个版本

[PDF] acm.org

COSMOS: Coordination of high-level synthesis and memory optimization for hardware accelerators

L Piccolboni, P Mantovani, GD Guglielmo… - ACM Transactions on …, 2017 - dl.acm.org

Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC)
architectures. With high-level synthesis (HLS), designers can easily obtain several …

被引用次数：59 相关文章所有 15 个版本

[PDF] acm.org

Manticore: Hardware-accelerated RTL simulation with static bulk-synchronous parallelism

M Emami, S Kashani, K Kamahori… - Proceedings of the 28th …, 2023 - dl.acm.org

The demise of Moore's Law and Dennard Scaling has revived interest in specialized
computer architectures and accelerators. Verification and testing of this hardware depend …

被引用次数：15 相关文章所有 6 个版本

[PDF] ieee.org

Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability

HM Waidyasooriya, M Hariyama - IEEE Access, 2019 - ieeexplore.ieee.org

After the introduction of the OpenCL-based FPGA accelerator design method, FPGAs are
getting very popular among high-performance computing. The key to achieving high …

被引用次数：40 相关文章所有 6 个版本

Enhancing the scalability of multi-fpga stencil computations via highly optimized hdl components

E Reggiani, E Del Sozzo, D Conficconi… - ACM Transactions on …, 2021 - dl.acm.org

Stencil-based algorithms are a relevant class of computational kernels in high-performance
systems, as they appear in a plethora of fields, from image processing to seismic …

被引用次数：16 相关文章所有 3 个版本

[PDF] acm.org

On how to accelerate iterative stencil loops: a scalable streaming-based approach

R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org

In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …

被引用次数：48 相关文章所有 5 个版本