Design and analysis of approximate compressors for balanced error accumulation in MAC operator

G Park, J Kung, Y Lee - … Transactions on Circuits and Systems I …, 2021 - ieeexplore.ieee.org
In this paper, we present a novel approximate computing scheme suitable for realizing the
energy-efficient multiply-accumulate (MAC) processing. In contrast to the prior works that …

Approximate LSTM computing for energy-efficient speech recognition

J Jo, J Kung, Y Lee - Electronics, 2020 - mdpi.com
This paper presents an approximate computing method of long short-term memory (LSTM)
operations for energy-efficient end-to-end speech recognition. We newly introduce the …

Memory-reduced network stacking for edge-level CNN architecture with structured weight pruning

S Moon, Y Byun, J Park, S Lee… - IEEE Journal on Emerging …, 2019 - ieeexplore.ieee.org
This paper presents a novel stacking and multi-level indexing scheme for convolutional
neural networks (CNNs) used in energy-limited edge-level systems. Basically, the proposed …

An energy-efficient convolutional neural network processor architecture based on a systolic array

C Zhang, X Wang, S Yong, Y Zhang, Q Li, C Wang - Applied Sciences, 2022 - mdpi.com
Deep convolutional neural networks (CNNs) have shown strong abilities in the application of
artificial intelligence. However, due to their extensive amount of computation, traditional …

AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning

HJ Kang, BD Yang - Sensors, 2023 - mdpi.com
Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML
applications, but their implementation usually requires external memory, which degrades the …

Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models

Y Byun, S Moon, B Park, SJ Kwon… - Proceedings of …, 2023 - proceedings.mlsys.org
This paper presents a new algorithm-hardware co-optimization approach that maximizes
memory bandwidth utilization even for the pruned deep neural network (DNN) models …

A 0.13 mJ/Prediction CIFAR-100 Raster-Scan-Based Wired-Logic Processor Using Non-Linear Neural Network

D Li, YC Hsu, R Sumikawa, A Kosuge… - … on Circuits and …, 2023 - ieeexplore.ieee.org
A 0.13 mJ/prediction with 68.6% accuracy single-chip wired-logic artificial intelligence (AI)
processor is developed in a 16nm field-programmable gate array (FPGA). Compared with …

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

HJ Kang - arXiv preprint arXiv:2212.11438, 2022 - arxiv.org
Convolutional neural network (CNN) accelerators are being widely used for their efficiency,
but they require a large amount of memory, leading to the use of a slow and power …

Fpga-based uav heterogeneous computing platform architecture

J Guo, B Jiang, H Xu - 2021 6th International Conference on …, 2021 - ieeexplore.ieee.org
The development of the UAV field is changing with each passing day, especially a lot of
research has been done on the road of autonomous and controllable. At present, the control …

A 0.13 mJ/Prediction CIFAR-100 Fully Synthesizable Raster-Scan-Based Wired-Logic Processor in 16-nm FPGA

D Li, Z Zhan, R Sumikawa, M Hamada… - IEICE Transactions …, 2024 - search.ieice.org
A 0.13 mJ/prediction with 68.6% accuracy wired-logic deep neural network (DNN) processor
is developed in a single 16-nm field-programmable gate array (FPGA) chip. Compared with …