Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network

H Sharma, J Park, N Suda, L Lai… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous
compute intensity. Fully realizing the potential of acceleration in this domain requires …

Tetris: Scalable and efficient neural network acceleration with 3d memory

M Gao, J Pu, X Yang, M Horowitz… - Proceedings of the Twenty …, 2017 - dl.acm.org
The high accuracy of deep neural networks (NNs) has led to the development of NN
accelerators that improve performance by two orders of magnitude. However, scaling these …

Accelergy: An architecture-level energy estimation methodology for accelerator designs

YN Wu, JS Emer, V Sze - 2019 IEEE/ACM International …, 2019 - ieeexplore.ieee.org
With Moore's law slowing down and Dennard scaling ended, energy-efficient domain-
specific accelerators, such as deep neural network (DNN) processors for machine learning …

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …

Snapea: Predictive early activation for reducing computation in deep convolutional neural networks

V Akhlaghi, A Yazdanbakhsh, K Samadi… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Deep Convolutional Neural Networks (CNNs) perform billions of operations for classifying a
single input. To reduce these computations, this paper offers a solution that leverages a …

Emerging monolithic 3D integration: Opportunities and challenges from the computer system perspective

Y Cheng, X Guo, VF Pavlidis - Integration, 2022 - Elsevier
In the past decade, monolithic three dimensional integrated circuits (M3D-ICs) advance fast
and demonstrate several important breakthroughs in the fabrication process and circuit level …

Efficient invisible speculative execution through selective delay and value prediction

C Sakalis, S Kaxiras, A Ros, A Jimborean… - Proceedings of the 46th …, 2019 - dl.acm.org
Speculative execution, the base on which modern high-performance general-purpose CPUs
are built on, has recently been shown to enable a slew of security attacks. All these attacks …

The McPAT framework for multicore and manycore architectures: Simultaneously modeling power, area, and timing

S Li, JH Ahn, RD Strong, JB Brockman… - ACM Transactions on …, 2013 - dl.acm.org
This article introduces McPAT, an integrated power, area, and timing modeling framework
that supports comprehensive design space exploration for multicore and manycore …

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations

K Kanellopoulos, N Vijaykumar, C Giannoula… - Proceedings of the …, 2019 - dl.acm.org
Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …

The mondrian data engine

M Drumond, A Daglis, N Mirzadeh, D Ustiugov… - ACM SIGARCH …, 2017 - dl.acm.org
The increasing demand for extracting value out of ever-growing data poses an ongoing
challenge to system designers, a task only made trickier by the end of Dennard scaling. As …