A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com

Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

被引用次数：31 相关文章所有 5 个版本

[PDF] nsf.gov

Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product

N Srivastava, H Jin, J Liu, D Albonesi… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Sparse-sparse matrix multiplication (SpGEMM) is a computation kernel widely used in
numerous application domains such as data analytics, graph processing, and scientific …

被引用次数：225 相关文章所有 10 个版本

[PDF] arxiv.org

EIE: Efficient inference engine on compressed deep neural network

S Han, X Liu, H Mao, J Pu, A Pedram… - ACM SIGARCH …, 2016 - dl.acm.org

State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and
are both computationally and memory intensive, making them difficult to deploy on …

被引用次数：3301 相关文章所有 29 个版本

[PDF] arxiv.org

Ese: Efficient speech recognition engine with sparse lstm on fpga

S Han, J Kang, H Mao, Y Hu, X Li, Y Li, D Xie… - Proceedings of the …, 2017 - dl.acm.org

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve
higher prediction accuracy, machine learning scientists have built increasingly larger …

被引用次数：855 相关文章所有 11 个版本

[PDF] github.io

Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan… - Proceedings of the …, 2019 - dl.acm.org

Neural networks based on Long Short-Term Memory (LSTM) are widely deployed in latency-
sensitive language and speech applications. To speed up LSTM inference, previous …

被引用次数：208 相关文章所有 6 个版本

[PDF] nsf.gov

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H Jin, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …

被引用次数：126 相关文章所有 10 个版本

[PDF] acm.org

DeltaRNN: A power-efficient recurrent neural network accelerator

C Gao, D Neil, E Ceolini, SC Liu… - Proceedings of the 2018 …, 2018 - dl.acm.org

Recurrent Neural Networks (RNNs) are widely used in speech recognition and natural
language processing applications because of their capability to process temporal …

被引用次数：184 相关文章所有 9 个版本

[PDF] gatech.edu

Tabla: A unified template-based framework for accelerating statistical machine learning

D Mahajan, J Park, E Amaro, H Sharma… - … Symposium on High …, 2016 - ieeexplore.ieee.org

A growing number of commercial and enterprise systems increasingly rely on compute-
intensive Machine Learning (ML) algorithms. While the demand for these compute-intensive …

被引用次数：219 相关文章所有 6 个版本

[PDF] nsf.gov

GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs

Y Hu, Y Du, E Ustun, Z Zhang - 2021 IEEE/ACM International …, 2021 - ieeexplore.ieee.org

Graph processing is typically memory bound due to low compute to memory access ratio
and irregular data access pattern. The emerging high-bandwidth memory (HBM) delivers …

被引用次数：76 相关文章所有 6 个版本

[PDF] ethz.ch

Sparsep: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, JG Luna, N Koziris… - Proceedings of the …, 2022 - dl.acm.org

Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

被引用次数：58 相关文章所有 3 个版本