Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Hardware approximate techniques for deep neural network accelerators: A survey

G Armeniakos, G Zervakis, D Soudris… - ACM Computing …, 2022 - dl.acm.org
Deep Neural Networks (DNNs) are very popular because of their high performance in
various cognitive tasks in Machine Learning (ML). Recent advancements in DNNs have …

Training deep neural networks with 8-bit floating point numbers

N Wang, J Choi, D Brand, CY Chen… - Advances in neural …, 2018 - proceedings.neurips.cc
The state-of-the-art hardware platforms for training deep neural networks are moving from
traditional single precision (32-bit) computations towards 16 bits of precision-in large part …

Ultra-low precision 4-bit training of deep neural networks

X Sun, N Wang, CY Chen, J Ni… - Advances in …, 2020 - proceedings.neurips.cc
In this paper, we propose a number of novel techniques and numerical representation
formats that enable, for the very first time, the precision of training systems to be aggressively …

In-memory computing: Advances and prospects

N Verma, H Jia, H Valavi, Y Tang… - IEEE Solid-State …, 2019 - ieeexplore.ieee.org
IMC has the potential to address a critical and foundational challenge affecting computing
platforms today-that is, the high energy and delay costs of moving data and accessing data …

[HTML][HTML] In-memory computing with emerging memory devices: Status and outlook

P Mannocci, M Farronato, N Lepri, L Cattaneo… - APL Machine …, 2023 - pubs.aip.org
In-memory computing (IMC) has emerged as a new computing paradigm able to alleviate or
suppress the memory bottleneck, which is the major concern for energy efficiency and …

[图书][B] Approximate Computing

W Liu, F Lombardi - 2022 - Springer
Computing systems at all scales (from mobile handheld devices to supercomputers, servers,
and large cloud-based data centers) have seen significant performance gains, mostly …

Gobo: Quantizing attention-based nlp models for low latency and energy efficient inference

AH Zadeh, I Edo, OM Awad… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Attention-based models have demonstrated remarkable success in various natural
language understanding tasks. However, efficient execution remains a challenge for these …

Accurate and efficient 2-bit quantized neural networks

J Choi, S Venkataramani… - Proceedings of …, 2019 - proceedings.mlsys.org
Deep learning algorithms achieve high classification accuracy at the expense of significant
computation cost. In order to reduce this cost, several quantization schemes have gained …

A retrospective and prospective view of approximate computing [point of view

W Liu, F Lombardi, M Shulte - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Computing systems are conventionally designed to operate as accurately as possible.
However, this trend faces severe technology challenges, such as power consumption, circuit …