An overview of efficient interconnection networks for deep neural network accelerators

SM Nabavinejad, M Baharloo, KC Chen… - IEEE Journal on …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have shown significant advantages in many domains, such
as pattern recognition, prediction, and control optimization. The edge computing demand in …

ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks

TJ Ham, Y Lee, SH Seo, S Kim, H Choi… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
The self-attention mechanism is rapidly emerging as one of the most important key primitives
in neural networks (NNs) for its ability to identify the relations within input entities. The self …

Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search

R Mao, B Wen, A Kazemi, Y Zhao, AF Laguna… - Nature …, 2022 - nature.com
Lifelong on-device learning is a key challenge for machine intelligence, and this requires
learning from few, often single, samples. Memory-augmented neural networks have been …

Robust high-dimensional memory-augmented neural networks

G Karunaratne, M Schmuck, M Le Gallo… - Nature …, 2021 - nature.com
Traditional neural networks require enormous amounts of data to build their complex
mappings during a slow training procedure that hinders their abilities for relearning and …

Dsagen: Synthesizing programmable spatial accelerators

J Weng, S Liu, V Dadu, Z Wang, P Shah… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Domain-specific hardware accelerators can provide orders of magnitude speedup and
energy efficiency over general purpose processors. However, they require extensive manual …

SAPIENS: A 64-kb RRAM-based non-volatile associative memory for one-shot learning and inference at the edge

H Li, WC Chen, A Levy, CH Wang… - … on Electron Devices, 2021 - ieeexplore.ieee.org
Learning from a few examples (one/few-shot learning) on the fly is a key challenge for on-
device machine intelligence. We present the first chip-level demonstration of one-shot …

First-generation inference accelerator deployment at facebook

M Anderson, B Chen, S Chen, S Deng, J Fix… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we provide a deep dive into the deployment of inference accelerators at
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …

Accelerating applications using edge tensor processing units

KC Hsu, HW Tseng - Proceedings of the International Conference for …, 2021 - dl.acm.org
Neural network (NN) accelerators have been integrated into a wide-spectrum of computer
systems to accommodate the rapidly growing demands for artificial intelligence (AI) and …

Scale-out systolic arrays

AC Yüzügüler, C Sönmez, M Drumond, Y Oh… - ACM Transactions on …, 2023 - dl.acm.org
Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference
accelerators. Despite their potential, designing multi-pod systolic arrays to maximize …

Algorithm-architecture co-design for domain-specific accelerators in communication and artificial intelligence

Y Tao - 2022 - deepblue.lib.umich.edu
The past decade has witnessed an explosive growth of data and the needs for high-speed
data communications and processing. The needs continue to drive the development of new …