查看文章

Neural cache: Bit-serial in-cache acceleration of deep neural networks

作者

Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, Reetuparna Das

发表日期

2018/6/1

研讨会论文

2018 ACM/IEEE 45Th annual international symposium on computer architecture (ISCA)

页码范围

383-396

出版商

IEEE

简介

This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 8.3× over state-of-art multi-core CPU (Xeon E5), 7.7× over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4× over CPU (2.2× over GPU), while reducing power consumption by 50% over CPU (53% over GPU).

引用总数

被引用次数：430

20192020202120222023202459 69 76 107 82 36

学术搜索中的文章

Neural cache: Bit-serial in-cache acceleration of deep neural networks

C Eckert, X Wang, J Wang, A Subramaniyan, R Iyer… - 2018 ACM/IEEE 45Th annual international symposium …, 2018

被引用次数：430 相关文章所有 14 个版本