[HTML][HTML] A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

A survey and taxonomy of FPGA-based deep learning accelerators

AG Blaiech, KB Khalifa, C Valderrama… - Journal of Systems …, 2019 - Elsevier
Deep learning, the fastest growing segment of Artificial Neural Network (ANN), has led to the
emergence of many machine learning applications and their implementation across multiple …

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …

FlexCNN: An end-to-end framework for composing CNN accelerators on FPGA

S Basalama, A Sohrabizadeh, J Wang, L Guo… - ACM Transactions on …, 2023 - dl.acm.org
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs)
create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable …

Accelerating attention through gradient-based learned runtime pruning

Z Li, S Ghodrati, A Yazdanbakhsh… - Proceedings of the 49th …, 2022 - dl.acm.org
Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural
Language Processing models. This attention mechanism calculates a correlation score for …

Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA

S Liu, H Fan, X Niu, H Ng, Y Chu, W Luk - ACM Transactions on …, 2018 - dl.acm.org
Convolutional Neural Networks--(CNNs) based algorithms have been successful in solving
image recognition problems, showing very large accuracy improvement. In recent years …

[HTML][HTML] Memristive GAN in analog

O Krestinskaya, B Choubey, AP James - Scientific reports, 2020 - nature.com
Abstract Generative Adversarial Network (GAN) requires extensive computing resources
making its implementation in edge devices with conventional microprocessor hardware a …

Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

Y Yu, T Zhao, M Wang, K Wang… - IEEE transactions on very …, 2020 - ieeexplore.ieee.org
In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient
uniform hardware acceleration of different types of transposed convolutional (TCONV) …

Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation

A Yazdanbakhsh, A Moradifirouzabadi… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
As its core computation, a self-attention mechanism gauges pairwise correlations across the
entire input sequence. Despite favorable performance, calculating pairwise correlations is …

GANPU: An energy-efficient multi-DNN training processor for GANs with speculative dual-sparsity exploitation

S Kang, D Han, J Lee, D Im, S Kim… - IEEE Journal of Solid …, 2021 - ieeexplore.ieee.org
This article presents generative adversarial network processing unit (GANPU), an energy-
efficient multiple deep neural network (DNN) training processor for GANs. It enables on …