Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization

HT Kung, B McDanel, SQ Zhang - Proceedings of the Twenty-Fourth …, 2019 - dl.acm.org
This paper describes a novel approach of packing sparse convolutional neural networks into
a denser format for efficient implementations using systolic arrays. By combining multiple …

iPIM: Programmable in-memory image processing accelerator using near-bank architecture

P Gu, X Xie, Y Ding, G Chen, W Zhang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Image processing is becoming an increasingly important domain for many applications on
workstations and the datacenter that require accelerators for high performance and energy …

Euphrates: Algorithm-soc co-design for low-power mobile continuous vision

Y Zhu, A Samajdar, M Mattina… - arXiv preprint arXiv …, 2018 - arxiv.org
Continuous computer vision (CV) tasks increasingly rely on convolutional neural networks
(CNN). However, CNNs have massive compute demands that far exceed the performance …

Deepstore: In-storage acceleration for intelligent queries

VS Mailthody, Z Qureshi, W Liang, Z Feng… - Proceedings of the …, 2019 - dl.acm.org
Recent advancements in deep learning techniques facilitate intelligent-query support in
diverse applications, such as content-based image retrieval and audio texturing. Unlike …

Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration

M Pellauer, YS Shao, J Clemons, N Crago… - Proceedings of the …, 2019 - dl.acm.org
Accelerators spend significant area and effort on custom on-chip buffering. Unfortunately,
these solutions are strongly tied to particular designs, hampering re-usability across other …

Hardware-software co-design for an analog-digital accelerator for machine learning

J Ambrosi, A Ankit, R Antunes… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
The increasing deployment of machine learning at the core and at the edge for applications
such as video and image recognition has resulted in a number of special purpose …

X-cache: a modular architecture for domain-specific caches

A Sedaghati, M Hakimi, R Hojabr… - Proceedings of the 49th …, 2022 - dl.acm.org
With Dennard scaling ending, architects are turning to domain-specific accelerators (DSAs).
State-of-the-art DSAs work with sparse data [37] and indirectly-indexed data structures [18 …

IDEAL: Image denoising accelerator

M Mahmoud, B Zheng, AD Lascorz, F Heide… - Proceedings of the 50th …, 2017 - dl.acm.org
Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the
high-quality images that are used for further processing. This work studies how Block …

A fusion method for multi-valued data

M Papčo, I Rodríguez-Martínez, J Fumanal-Idocin… - Information …, 2021 - Elsevier
In this paper we propose an extension of the notion of deviation-based aggregation function
tailored to aggregate multidimensional data. Our objective is both to improve the results …

Neda: Supporting Direct Inter-Core Neighbor Data Exchange in GPUs

N Nematollahi, M Sadrosadati… - IEEE Computer …, 2018 - ieeexplore.ieee.org
Image processing applications employ various filters for several purposes, such as
enhancing the images and extracting the features. Recent studies show that filters in image …