Deep learning with low precision by half-wave gaussian quantization

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org

Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

被引用次数：792 相关文章所有 2 个版本

A comprehensive survey on model compression and acceleration

T Choudhary, V Mishra, A Goswami… - Artificial Intelligence …, 2020 - Springer

In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …

被引用次数：396 相关文章所有 8 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：977 相关文章所有 4 个版本

[PDF] arxiv.org

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier

Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

被引用次数：592 相关文章所有 6 个版本

[PDF] arxiv.org

Frugalgpt: How to use large language models while reducing cost and improving performance

L Chen, M Zaharia, J Zou - arXiv preprint arXiv:2305.05176, 2023 - arxiv.org

There is a rapidly growing number of large language models (LLMs) that users can query for
a fee. We review the cost associated with querying popular LLM APIs, eg GPT-4, ChatGPT …

被引用次数：93 相关文章所有 3 个版本

[PDF] arxiv.org

Binary neural networks: A survey

H Qin, R Gong, X Liu, X Bai, J Song, N Sebe - Pattern Recognition, 2020 - Elsevier

The binary neural network, largely saving the storage and computation, serves as a
promising technique for deploying deep models on resource-limited devices. However, the …

被引用次数：515 相关文章所有 9 个版本

[PDF] arxiv.org

Learned step size quantization

SK Esser, JL McKinstry, D Bablani… - arXiv preprint arXiv …, 2019 - arxiv.org

Deep networks run with low precision operations at inference time offer power and space
advantages over high precision alternatives, but need to overcome the challenge of …

被引用次数：730 相关文章所有 6 个版本

[PDF] thecvf.com

Differentiable soft quantization: Bridging full-precision and low-bit neural networks

R Gong, X Liu, S Jiang, T Li, P Hu… - Proceedings of the …, 2019 - openaccess.thecvf.com

Hardware-friendly network quantization (eg, binary/uniform quantization) can efficiently
accelerate the inference and meanwhile reduce memory consumption of the deep neural …

被引用次数：473 相关文章所有 12 个版本

[PDF] ieee.org

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Reactnet: Towards precise binary neural network with generalized activation functions

Z Liu, Z Shen, M Savvides, KT Cheng - … Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

In this paper, we propose several ideas for enhancing a binary network to close its accuracy
gap from real-valued networks without incurring any additional computational cost. We first …

被引用次数：357 相关文章所有 6 个版本