Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization
This paper describes a novel approach of packing sparse convolutional neural networks into
a denser format for efficient implementations using systolic arrays. By combining multiple …
a denser format for efficient implementations using systolic arrays. By combining multiple …
iPIM: Programmable in-memory image processing accelerator using near-bank architecture
Image processing is becoming an increasingly important domain for many applications on
workstations and the datacenter that require accelerators for high performance and energy …
workstations and the datacenter that require accelerators for high performance and energy …
Euphrates: Algorithm-soc co-design for low-power mobile continuous vision
Continuous computer vision (CV) tasks increasingly rely on convolutional neural networks
(CNN). However, CNNs have massive compute demands that far exceed the performance …
(CNN). However, CNNs have massive compute demands that far exceed the performance …
Deepstore: In-storage acceleration for intelligent queries
Recent advancements in deep learning techniques facilitate intelligent-query support in
diverse applications, such as content-based image retrieval and audio texturing. Unlike …
diverse applications, such as content-based image retrieval and audio texturing. Unlike …
Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration
Accelerators spend significant area and effort on custom on-chip buffering. Unfortunately,
these solutions are strongly tied to particular designs, hampering re-usability across other …
these solutions are strongly tied to particular designs, hampering re-usability across other …
Hardware-software co-design for an analog-digital accelerator for machine learning
J Ambrosi, A Ankit, R Antunes… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
The increasing deployment of machine learning at the core and at the edge for applications
such as video and image recognition has resulted in a number of special purpose …
such as video and image recognition has resulted in a number of special purpose …
X-cache: a modular architecture for domain-specific caches
With Dennard scaling ending, architects are turning to domain-specific accelerators (DSAs).
State-of-the-art DSAs work with sparse data [37] and indirectly-indexed data structures [18 …
State-of-the-art DSAs work with sparse data [37] and indirectly-indexed data structures [18 …
IDEAL: Image denoising accelerator
Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the
high-quality images that are used for further processing. This work studies how Block …
high-quality images that are used for further processing. This work studies how Block …
A fusion method for multi-valued data
M Papčo, I Rodríguez-Martínez, J Fumanal-Idocin… - Information …, 2021 - Elsevier
In this paper we propose an extension of the notion of deviation-based aggregation function
tailored to aggregate multidimensional data. Our objective is both to improve the results …
tailored to aggregate multidimensional data. Our objective is both to improve the results …
Neda: Supporting Direct Inter-Core Neighbor Data Exchange in GPUs
N Nematollahi, M Sadrosadati… - IEEE Computer …, 2018 - ieeexplore.ieee.org
Image processing applications employ various filters for several purposes, such as
enhancing the images and extracting the features. Recent studies show that filters in image …
enhancing the images and extracting the features. Recent studies show that filters in image …