A configurable floating-point multiple-precision processing element for HPC and AI converged computing
There is an emerging need to design configurable accelerators for the high-performance
computing (HPC) and artificial intelligence (AI) applications in different precisions. Thus, the …
computing (HPC) and artificial intelligence (AI) applications in different precisions. Thus, the …
Quantized sparse training: A unified trainable framework for joint pruning and quantization in DNNs
Deep neural networks typically have extensive parameters and computational operations.
Pruning and quantization techniques have been widely used to reduce the complexity of …
Pruning and quantization techniques have been widely used to reduce the complexity of …
Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs
Block floating point (BFP), an efficient numerical system for deep neural networks (DNNs),
achieves a good trade-off between dynamic range and hardware costs. Specifically, prior …
achieves a good trade-off between dynamic range and hardware costs. Specifically, prior …
Number systems for deep neural network architectures: a survey
Deep neural networks (DNNs) have become an enabling component for a myriad of artificial
intelligence applications. DNNs have shown sometimes superior performance, even …
intelligence applications. DNNs have shown sometimes superior performance, even …
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
In recent years, PIM-based mixed-signal accelerators have been proposed as energy-and
area-efficient solutions with ultra-high throughput to accelerate DNN computations …
area-efficient solutions with ultra-high throughput to accelerate DNN computations …
Integer Is Enough: When Vertical Federated Learning Meets Rounding
Vertical Federated Learning (VFL) is a solution increasingly used by companies with the
same user group but differing features, enabling them to collaboratively train a machine …
same user group but differing features, enabling them to collaboratively train a machine …
An algorithm-hardware co-design framework to overcome imperfections of mixed-signal dnn accelerators
In recent years, processing in memory (PIM) based mixedsignal designs have been
proposed as energy-and area-efficient solutions with ultra high throughput to accelerate …
proposed as energy-and area-efficient solutions with ultra high throughput to accelerate …
Chiplet-GAN: Chiplet-Based Accelerator Design for Scalable Generative Adversarial Network Inference [Feature]
Y Chen, A Louri, F Lombardi… - IEEE Circuits and Systems …, 2024 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have emerged as a powerful solution for
generating synthetic data when the availability of large, labeled training datasets is limited or …
generating synthetic data when the availability of large, labeled training datasets is limited or …
Algorithm/architecture solutions to improve beyond uniform quantization in embedded dnn accelerators
The choice of data type has a major impact on speed, accuracy, and power consumption of
deep learning accelerators. Quantizing the weights and activations of neural networks to …
deep learning accelerators. Quantizing the weights and activations of neural networks to …
Conventional Number Systems for DNN Architectures
The two conventional number systems, namely the floating point and fixed point, are
commonly used in almost all general-purpose DNN engines. While the FLP representation …
commonly used in almost all general-purpose DNN engines. While the FLP representation …