Near-lossless post-training quantization of deep neural networks via a piecewise linear approxima...

Z Li, H Li, L Meng - Computers, 2023 - mdpi.com

Currently, with the rapid development of deep learning, deep neural networks (DNNs) have
been widely applied in various computer vision tasks. However, in the pursuit of …

被引用次数：139 相关文章所有 3 个版本

[HTML] frontiersin.org

[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org

In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

被引用次数：64 相关文章所有 27 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1313 相关文章所有 4 个版本

[PDF] neurips.cc

Hawq-v2: Hessian aware trace-weighted quantization of neural networks

Z Dong, Z Yao, D Arfeen, A Gholami… - Advances in neural …, 2020 - proceedings.neurips.cc

Quantization is an effective method for reducing memory footprint and inference time of
Neural Networks. However, ultra low precision quantization could lead to significant …

被引用次数：300 相关文章所有 10 个版本

[PDF] springer.com

Loss aware post-training quantization

Y Nahshan, B Chmiel, C Baskin, E Zheltonozhskii… - Machine Learning, 2021 - Springer

Neural network quantization enables the deployment of large models on resource-
constrained devices. Current post-training quantization methods fall short in terms of …

被引用次数：185 相关文章所有 8 个版本

[PDF] mlsys.org

Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference

S Dai, R Venkatesan, M Ren… - Proceedings of …, 2021 - proceedings.mlsys.org

Quantization enables efficient acceleration of deep neural networks by reducing model
memory footprint and exploiting low-cost integer math hardware units. Quantization maps …

被引用次数：66 相关文章所有 5 个版本

[PDF] arxiv.org

F8net: Fixed-point 8-bit only multiplication for network quantization

Q Jin, J Ren, R Zhuang, S Hanumante, Z Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Neural network quantization is a promising compression technique to reduce memory
footprint and save energy consumption, potentially leading to real-time inference. However …

被引用次数：49 相关文章所有 3 个版本

[PDF] mlsys.org

Efficient post-training quantization with fp8 formats

H Shen, N Mellempudi, X He, Q Gao… - Proceedings of …, 2024 - proceedings.mlsys.org

Recent advances in deep learning methods such as LLMs and Diffusion models have
created a need for improved quantization methods that can meet the computational …

被引用次数：14 相关文章所有 3 个版本

[PDF] luc.edu

[图书][B] Low-power computer vision: improve the efficiency of artificial intelligence

GK Thiruvathukal, YH Lu, J Kim, Y Chen, B Chen - 2022 - books.google.com

Energy efficiency is critical for running computer vision on battery-powered systems, such as
mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the …

被引用次数：20 相关文章所有 5 个版本

[PDF] mlsys.org

Subgraph stationary hardware-software inference co-design

P Behnam, A Tumanov, T Krishna… - Proceedings of …, 2023 - proceedings.mlsys.org

A growing number of applications depend on Machine Learning (ML) functionality and
benefits from both higher quality ML predictions and better timeliness (latency) at the same …

被引用次数：3 相关文章所有 6 个版本