On-chip memory optimized CNN accelerator with efficient partial-sum accumulation

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

On-chip memory optimized CNN accelerator with efficient partial-sum accumulation

在引用文章中搜索

[PDF] archive.org

An Efficient CNN Inference Accelerator Based on Intra-and Inter-Channel Feature Map Compression

C Xie, Z Shao, N Zhao, Y Du… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep convolutional neural networks (CNNs) generate intensive inter-layer data during
inference, which results in substantial on-chip memory size and off-chip bandwidth. To solve …

被引用次数：3 相关文章所有 5 个版本

[HTML] mdpi.com

[HTML][HTML] Minimizing Global Buffer Access in a Deep Learning Accelerator Using a Local Register File with a Rearranged Computational Sequence

M Lee, Z Zhang, S Choi, J Choi - Sensors, 2022 - mdpi.com

We propose a method for minimizing global buffer access within a deep learning accelerator
for convolution operations by maximizing the data reuse through a local register file, thereby …

被引用次数：1 相关文章所有 9 个版本

[PDF] ieee.org

MOSDA: On-Chip Memory Optimized Sparse Deep Neural Network Accelerator with Efficient Index Matching

H Xu, J Shiomi, H Onodera - IEEE Open Journal of Circuits and …, 2020 - ieeexplore.ieee.org

The irregular data access pattern caused by sparsity brings great challenges to efficient
processing accelerators. Focusing on the index-matching property in DNN, this article aims …

被引用次数：1 相关文章所有 2 个版本

Evaluation Metrics for the Cost of Data Movement in Deep Neural Network Acceleration

H Xu, J Shiomi, H Onodera - IEICE Transactions on Fundamentals …, 2021 - search.ieice.org

Hardware accelerators are designed to support a specialized processing dataflow for
everchanging deep neural networks (DNNs) under various processing environments. This …