[HTML][HTML] Applications and techniques for fast machine learning in science
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
A survey of quantization methods for efficient neural network inference
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …
Neural Network computations, covering the advantages/disadvantages of current methods …
Hawq-v2: Hessian aware trace-weighted quantization of neural networks
Quantization is an effective method for reducing memory footprint and inference time of
Neural Networks. However, ultra low precision quantization could lead to significant …
Neural Networks. However, ultra low precision quantization could lead to significant …
Loss aware post-training quantization
Neural network quantization enables the deployment of large models on resource-
constrained devices. Current post-training quantization methods fall short in terms of …
constrained devices. Current post-training quantization methods fall short in terms of …
Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference
Quantization enables efficient acceleration of deep neural networks by reducing model
memory footprint and exploiting low-cost integer math hardware units. Quantization maps …
memory footprint and exploiting low-cost integer math hardware units. Quantization maps …
F8net: Fixed-point 8-bit only multiplication for network quantization
Neural network quantization is a promising compression technique to reduce memory
footprint and save energy consumption, potentially leading to real-time inference. However …
footprint and save energy consumption, potentially leading to real-time inference. However …
Efficient post-training quantization with fp8 formats
Recent advances in deep learning methods such as LLMs and Diffusion models have
created a need for improved quantization methods that can meet the computational …
created a need for improved quantization methods that can meet the computational …
[图书][B] Low-power computer vision: improve the efficiency of artificial intelligence
Energy efficiency is critical for running computer vision on battery-powered systems, such as
mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the …
mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the …
Subgraph stationary hardware-software inference co-design
A growing number of applications depend on Machine Learning (ML) functionality and
benefits from both higher quality ML predictions and better timeliness (latency) at the same …
benefits from both higher quality ML predictions and better timeliness (latency) at the same …