Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network
Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous
compute intensity. Fully realizing the potential of acceleration in this domain requires …
compute intensity. Fully realizing the potential of acceleration in this domain requires …
Tetris: Scalable and efficient neural network acceleration with 3d memory
The high accuracy of deep neural networks (NNs) has led to the development of NN
accelerators that improve performance by two orders of magnitude. However, scaling these …
accelerators that improve performance by two orders of magnitude. However, scaling these …
Accelergy: An architecture-level energy estimation methodology for accelerator designs
With Moore's law slowing down and Dennard scaling ended, energy-efficient domain-
specific accelerators, such as deep neural network (DNN) processors for machine learning …
specific accelerators, such as deep neural network (DNN) processors for machine learning …
Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks
S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …
learning patterns of data and are permeating into different industries and markets. Cloud …
Snapea: Predictive early activation for reducing computation in deep convolutional neural networks
Deep Convolutional Neural Networks (CNNs) perform billions of operations for classifying a
single input. To reduce these computations, this paper offers a solution that leverages a …
single input. To reduce these computations, this paper offers a solution that leverages a …
Emerging monolithic 3D integration: Opportunities and challenges from the computer system perspective
In the past decade, monolithic three dimensional integrated circuits (M3D-ICs) advance fast
and demonstrate several important breakthroughs in the fabrication process and circuit level …
and demonstrate several important breakthroughs in the fabrication process and circuit level …
Efficient invisible speculative execution through selective delay and value prediction
Speculative execution, the base on which modern high-performance general-purpose CPUs
are built on, has recently been shown to enable a slew of security attacks. All these attacks …
are built on, has recently been shown to enable a slew of security attacks. All these attacks …
The McPAT framework for multicore and manycore architectures: Simultaneously modeling power, area, and timing
This article introduces McPAT, an integrated power, area, and timing modeling framework
that supports comprehensive design space exploration for multicore and manycore …
that supports comprehensive design space exploration for multicore and manycore …
Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations
Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …
involve sparse linear algebra operations. These operations use sparse matrix compression …
The mondrian data engine
The increasing demand for extracting value out of ever-growing data poses an ongoing
challenge to system designers, a task only made trickier by the end of Dennard scaling. As …
challenge to system designers, a task only made trickier by the end of Dennard scaling. As …