An overview of efficient interconnection networks for deep neural network accelerators
Deep Neural Networks (DNNs) have shown significant advantages in many domains, such
as pattern recognition, prediction, and control optimization. The edge computing demand in …
as pattern recognition, prediction, and control optimization. The edge computing demand in …
ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks
The self-attention mechanism is rapidly emerging as one of the most important key primitives
in neural networks (NNs) for its ability to identify the relations within input entities. The self …
in neural networks (NNs) for its ability to identify the relations within input entities. The self …
Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search
Lifelong on-device learning is a key challenge for machine intelligence, and this requires
learning from few, often single, samples. Memory-augmented neural networks have been …
learning from few, often single, samples. Memory-augmented neural networks have been …
Robust high-dimensional memory-augmented neural networks
Traditional neural networks require enormous amounts of data to build their complex
mappings during a slow training procedure that hinders their abilities for relearning and …
mappings during a slow training procedure that hinders their abilities for relearning and …
Dsagen: Synthesizing programmable spatial accelerators
Domain-specific hardware accelerators can provide orders of magnitude speedup and
energy efficiency over general purpose processors. However, they require extensive manual …
energy efficiency over general purpose processors. However, they require extensive manual …
SAPIENS: A 64-kb RRAM-based non-volatile associative memory for one-shot learning and inference at the edge
Learning from a few examples (one/few-shot learning) on the fly is a key challenge for on-
device machine intelligence. We present the first chip-level demonstration of one-shot …
device machine intelligence. We present the first chip-level demonstration of one-shot …
First-generation inference accelerator deployment at facebook
M Anderson, B Chen, S Chen, S Deng, J Fix… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we provide a deep dive into the deployment of inference accelerators at
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …
Accelerating applications using edge tensor processing units
Neural network (NN) accelerators have been integrated into a wide-spectrum of computer
systems to accommodate the rapidly growing demands for artificial intelligence (AI) and …
systems to accommodate the rapidly growing demands for artificial intelligence (AI) and …
Scale-out systolic arrays
Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference
accelerators. Despite their potential, designing multi-pod systolic arrays to maximize …
accelerators. Despite their potential, designing multi-pod systolic arrays to maximize …
Algorithm-architecture co-design for domain-specific accelerators in communication and artificial intelligence
Y Tao - 2022 - deepblue.lib.umich.edu
The past decade has witnessed an explosive growth of data and the needs for high-speed
data communications and processing. The needs continue to drive the development of new …
data communications and processing. The needs continue to drive the development of new …