A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product

N Srivastava, H Jin, J Liu, D Albonesi… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Sparse-sparse matrix multiplication (SpGEMM) is a computation kernel widely used in
numerous application domains such as data analytics, graph processing, and scientific …

EIE: Efficient inference engine on compressed deep neural network

S Han, X Liu, H Mao, J Pu, A Pedram… - ACM SIGARCH …, 2016 - dl.acm.org
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and
are both computationally and memory intensive, making them difficult to deploy on …

Ese: Efficient speech recognition engine with sparse lstm on fpga

S Han, J Kang, H Mao, Y Hu, X Li, Y Li, D Xie… - Proceedings of the …, 2017 - dl.acm.org
Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve
higher prediction accuracy, machine learning scientists have built increasingly larger …

Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan… - Proceedings of the …, 2019 - dl.acm.org
Neural networks based on Long Short-Term Memory (LSTM) are widely deployed in latency-
sensitive language and speech applications. To speed up LSTM inference, previous …

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H Jin, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …

DeltaRNN: A power-efficient recurrent neural network accelerator

C Gao, D Neil, E Ceolini, SC Liu… - Proceedings of the 2018 …, 2018 - dl.acm.org
Recurrent Neural Networks (RNNs) are widely used in speech recognition and natural
language processing applications because of their capability to process temporal …

Tabla: A unified template-based framework for accelerating statistical machine learning

D Mahajan, J Park, E Amaro, H Sharma… - … Symposium on High …, 2016 - ieeexplore.ieee.org
A growing number of commercial and enterprise systems increasingly rely on compute-
intensive Machine Learning (ML) algorithms. While the demand for these compute-intensive …

GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs

Y Hu, Y Du, E Ustun, Z Zhang - 2021 IEEE/ACM International …, 2021 - ieeexplore.ieee.org
Graph processing is typically memory bound due to low compute to memory access ratio
and irregular data access pattern. The emerging high-bandwidth memory (HBM) delivers …

Sparsep: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, JG Luna, N Koziris… - Proceedings of the …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …