Model compression for deep neural networks: A survey

Z Li, H Li, L Meng - Computers, 2023 - mdpi.com
Currently, with the rapid development of deep learning, deep neural networks (DNNs) have
been widely applied in various computer vision tasks. However, in the pursuit of …

[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Hawq-v2: Hessian aware trace-weighted quantization of neural networks

Z Dong, Z Yao, D Arfeen, A Gholami… - Advances in neural …, 2020 - proceedings.neurips.cc
Quantization is an effective method for reducing memory footprint and inference time of
Neural Networks. However, ultra low precision quantization could lead to significant …

Loss aware post-training quantization

Y Nahshan, B Chmiel, C Baskin, E Zheltonozhskii… - Machine Learning, 2021 - Springer
Neural network quantization enables the deployment of large models on resource-
constrained devices. Current post-training quantization methods fall short in terms of …

Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference

S Dai, R Venkatesan, M Ren… - Proceedings of …, 2021 - proceedings.mlsys.org
Quantization enables efficient acceleration of deep neural networks by reducing model
memory footprint and exploiting low-cost integer math hardware units. Quantization maps …

F8net: Fixed-point 8-bit only multiplication for network quantization

Q Jin, J Ren, R Zhuang, S Hanumante, Z Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Neural network quantization is a promising compression technique to reduce memory
footprint and save energy consumption, potentially leading to real-time inference. However …

Efficient post-training quantization with fp8 formats

H Shen, N Mellempudi, X He, Q Gao… - Proceedings of …, 2024 - proceedings.mlsys.org
Recent advances in deep learning methods such as LLMs and Diffusion models have
created a need for improved quantization methods that can meet the computational …

[图书][B] Low-power computer vision: improve the efficiency of artificial intelligence

GK Thiruvathukal, YH Lu, J Kim, Y Chen, B Chen - 2022 - books.google.com
Energy efficiency is critical for running computer vision on battery-powered systems, such as
mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the …

Subgraph stationary hardware-software inference co-design

P Behnam, A Tumanov, T Krishna… - Proceedings of …, 2023 - proceedings.mlsys.org
A growing number of applications depend on Machine Learning (ML) functionality and
benefits from both higher quality ML predictions and better timeliness (latency) at the same …