A configurable floating-point multiple-precision processing element for HPC and AI converged computing

W Mao, K Li, Q Cheng, L Dai, B Li, X Xie… - … Transactions on Very …, 2021 - ieeexplore.ieee.org
There is an emerging need to design configurable accelerators for the high-performance
computing (HPC) and artificial intelligence (AI) applications in different precisions. Thus, the …

Quantized sparse training: A unified trainable framework for joint pruning and quantization in DNNs

JH Park, KM Kim, S Lee - ACM Transactions on Embedded Computing …, 2022 - dl.acm.org
Deep neural networks typically have extensive parameters and computational operations.
Pruning and quantization techniques have been widely used to reduce the complexity of …

Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs

YC Lo, RS Liu - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Block floating point (BFP), an efficient numerical system for deep neural networks (DNNs),
achieves a good trade-off between dynamic range and hardware costs. Specifically, prior …

Number systems for deep neural network architectures: a survey

G Alsuhli, V Sakellariou, H Saleh, M Al-Qutayri… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep neural networks (DNNs) have become an enabling component for a myriad of artificial
intelligence applications. DNNs have shown sometimes superior performance, even …

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators

P Behnam, U Kamal, A Shafiee… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
In recent years, PIM-based mixed-signal accelerators have been proposed as energy-and
area-efficient solutions with ultra-high throughput to accelerate DNN computations …

Integer Is Enough: When Vertical Federated Learning Meets Rounding

P Qiu, Y Pu, Y Liu, W Liu, Y Yue, X Zhu, L Li… - Proceedings of the …, 2024 - ojs.aaai.org
Vertical Federated Learning (VFL) is a solution increasingly used by companies with the
same user group but differing features, enabling them to collaboratively train a machine …

An algorithm-hardware co-design framework to overcome imperfections of mixed-signal dnn accelerators

P Behnam, U Kamal, S Mukhopadhyay - arXiv preprint arXiv:2208.13896, 2022 - arxiv.org
In recent years, processing in memory (PIM) based mixedsignal designs have been
proposed as energy-and area-efficient solutions with ultra high throughput to accelerate …

Chiplet-GAN: Chiplet-Based Accelerator Design for Scalable Generative Adversarial Network Inference [Feature]

Y Chen, A Louri, F Lombardi… - IEEE Circuits and Systems …, 2024 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have emerged as a powerful solution for
generating synthetic data when the availability of large, labeled training datasets is limited or …

Algorithm/architecture solutions to improve beyond uniform quantization in embedded dnn accelerators

A Pedram, AS Ardestani, L Li, H Abdelaziz… - Journal of Systems …, 2022 - Elsevier
The choice of data type has a major impact on speed, accuracy, and power consumption of
deep learning accelerators. Quantizing the weights and activations of neural networks to …

Conventional Number Systems for DNN Architectures

G Alsuhli, V Sakellariou, H Saleh, M Al-Qutayri… - Number Systems for …, 2023 - Springer
The two conventional number systems, namely the floating point and fixed point, are
commonly used in almost all general-purpose DNN engines. While the FLP representation …