Efficient deep learning: A survey on making deep learning models smaller, faster, and better

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Knowledge graphs

A Hogan, E Blomqvist, M Cochez, C d'Amato… - ACM Computing …, 2021 - dl.acm.org
In this article, we provide a comprehensive introduction to knowledge graphs, which have
recently garnered significant attention from both industry and academia in scenarios that …

Photonic multiply-accumulate operations for neural networks

MA Nahmias, TF De Lima, AN Tait… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
It has long been known that photonic communication can alleviate the data movement
bottlenecks that plague conventional microelectronic processors. More recently, there has …

Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arXiv preprint arXiv:1907.10701, 2019 - arxiv.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

Sparch: Efficient architecture for sparse matrix multiplication

Z Zhang, H Wang, S Han… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various
engineering and scientific applications. However, inner product based SpGEMM introduces …

Deep learning with limited numerical precision

S Gupta, A Agrawal… - International …, 2015 - proceedings.mlr.press
Training of large-scale deep neural networks is often constrained by the available
computational resources. We study the effect of limited precision data representation and …

Hardware implementation of deep network accelerators towards healthcare and biomedical applications

MR Azghadi, C Lammie, JK Eshraghian… - … Circuits and Systems, 2020 - ieeexplore.ieee.org
The advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors
has brought on new opportunities for applying both Deep and Spiking Neural Network …

GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis

DS Cali, GS Kalsi, Z Bingöl, C Firtina… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Genome sequence analysis has enabled significant advancements in medical and scientific
areas such as personalized medicine, outbreak tracing, and the understanding of evolution …

HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

R Khaddam-Aljameh, M Stanisavljevic… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
We present a 256 256 in-memory compute (IMC) core designed and fabricated in 14-nm
CMOS technology with backend-integrated multi-level phase change memory (PCM). It …

Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization

HT Kung, B McDanel, SQ Zhang - Proceedings of the Twenty-Fourth …, 2019 - dl.acm.org
This paper describes a novel approach of packing sparse convolutional neural networks into
a denser format for efficient implementations using systolic arrays. By combining multiple …