An on-the-fly feature map compression engine for background memory access cost reduction in dnn inference

G Rutishauser, LAJ Cavigelli, L Benini - 2020 - research-collection.ethz.ch
2020research-collection.ethz.ch
Specialized hardware architectures and dedicated accelerators allow the application of
Deep Learning directly within sensing nodes. With compute resources highly optimized for
energy efficiency, a large part of the power consumption of such devices is caused by
transfers of intermediate feature maps to and from large memories. Moreover, a significant
share of the silicon area is dedicated to these memories to avoid highly expensive off-chip
memory accesses. Extended Bit-Plane Compression (EBPC), a recently proposed …
Specialized hardware architectures and dedicated accelerators allow the application of Deep Learning directly within sensing nodes. With compute resources highly optimized for energy efficiency, a large part of the power consumption of such devices is caused by transfers of intermediate feature maps to and from large memories. Moreover, a significant share of the silicon area is dedicated to these memories to avoid highly expensive off-chip memory accesses. Extended Bit-Plane Compression (EBPC), a recently proposed compression scheme targeting DNN feature maps, offers an opportunity to increase energy efficiency by reducing both the data transfer volume and the size of large background memories. Besides exhibiting state-of-the-art compression ratios, it also has a small, simple hardware implementation. In post-layout power simulations, we show an energy cost between 0.27 pJ/word and 0.45 pJ/word, 3 orders of magnitude lower than the cost of off-chip memory accesses. It allows for a reduction in off-chip access energy by factors of 2.2x to 4x for MobileNetV2 and VGG16 respectively and can reduce on-chip access energy by up to 45 %. We further propose a way to integrate the EBPC hardware blocks, which perform on-the-fly compression and decompression on 8-bit feature map streams, into an embedded ultra-low-power processing system and show how the challenges arising from a variable-length compressed representation can be navigated in this context.
research-collection.ethz.ch
以上显示的是最相近的搜索结果。 查看全部搜索结果