Why systolic architecture?

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org

Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

被引用次数：338 相关文章所有 6 个版本

[PDF] acm.org

Knowledge graphs

A Hogan, E Blomqvist, M Cochez, C d'Amato… - ACM Computing …, 2021 - dl.acm.org

In this article, we provide a comprehensive introduction to knowledge graphs, which have
recently garnered significant attention from both industry and academia in scenarios that …

被引用次数：1674 相关文章所有 51 个版本

[PDF] ieee.org

Photonic multiply-accumulate operations for neural networks

MA Nahmias, TF De Lima, AN Tait… - IEEE Journal of …, 2019 - ieeexplore.ieee.org

It has long been known that photonic communication can alleviate the data movement
bottlenecks that plague conventional microelectronic processors. More recently, there has …

被引用次数：299 相关文章所有 10 个版本

[PDF] arxiv.org

Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arXiv preprint arXiv:1907.10701, 2019 - arxiv.org

Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

被引用次数：350 相关文章所有 5 个版本

[PDF] arxiv.org

Sparch: Efficient architecture for sparse matrix multiplication

Z Zhang, H Wang, S Han… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various
engineering and scientific applications. However, inner product based SpGEMM introduces …

被引用次数：254 相关文章所有 10 个版本

[PDF] mlr.press

Deep learning with limited numerical precision

S Gupta, A Agrawal… - International …, 2015 - proceedings.mlr.press

Training of large-scale deep neural networks is often constrained by the available
computational resources. We study the effect of limited precision data representation and …

被引用次数：2624 相关文章所有 14 个版本

[PDF] ieee.org

Hardware implementation of deep network accelerators towards healthcare and biomedical applications

MR Azghadi, C Lammie, JK Eshraghian… - … Circuits and Systems, 2020 - ieeexplore.ieee.org

The advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors
has brought on new opportunities for applying both Deep and Spiking Neural Network …

被引用次数：181 相关文章所有 14 个版本

[PDF] arxiv.org

GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis

DS Cali, GS Kalsi, Z Bingöl, C Firtina… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Genome sequence analysis has enabled significant advancements in medical and scientific
areas such as personalized medicine, outbreak tracing, and the understanding of evolution …

被引用次数：143 相关文章所有 19 个版本

[PDF] ieee.org

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

R Khaddam-Aljameh, M Stanisavljevic… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org

We present a 256 256 in-memory compute (IMC) core designed and fabricated in 14-nm
CMOS technology with backend-integrated multi-level phase change memory (PCM). It …

被引用次数：86 相关文章所有 5 个版本

[PDF] acm.org

Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization

HT Kung, B McDanel, SQ Zhang - Proceedings of the Twenty-Fourth …, 2019 - dl.acm.org

This paper describes a novel approach of packing sparse convolutional neural networks into
a denser format for efficient implementations using systolic arrays. By combining multiple …

被引用次数：192 相关文章所有 6 个版本

Efficient deep learning: A survey on making deep learning models smaller, faster, and better

Knowledge graphs

Photonic multiply-accumulate operations for neural networks

Benchmarking TPU, GPU, and CPU platforms for deep learning

Sparch: Efficient architecture for sparse matrix multiplication

Deep learning with limited numerical precision

Hardware implementation of deep network accelerators towards healthcare and biomedical applications

GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization

高级搜索

引用