Design and analysis of approximate compressors for balanced error accumulation in MAC operator
In this paper, we present a novel approximate computing scheme suitable for realizing the
energy-efficient multiply-accumulate (MAC) processing. In contrast to the prior works that …
energy-efficient multiply-accumulate (MAC) processing. In contrast to the prior works that …
Approximate LSTM computing for energy-efficient speech recognition
This paper presents an approximate computing method of long short-term memory (LSTM)
operations for energy-efficient end-to-end speech recognition. We newly introduce the …
operations for energy-efficient end-to-end speech recognition. We newly introduce the …
Memory-reduced network stacking for edge-level CNN architecture with structured weight pruning
This paper presents a novel stacking and multi-level indexing scheme for convolutional
neural networks (CNNs) used in energy-limited edge-level systems. Basically, the proposed …
neural networks (CNNs) used in energy-limited edge-level systems. Basically, the proposed …
An energy-efficient convolutional neural network processor architecture based on a systolic array
C Zhang, X Wang, S Yong, Y Zhang, Q Li, C Wang - Applied Sciences, 2022 - mdpi.com
Deep convolutional neural networks (CNNs) have shown strong abilities in the application of
artificial intelligence. However, due to their extensive amount of computation, traditional …
artificial intelligence. However, due to their extensive amount of computation, traditional …
AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning
Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML
applications, but their implementation usually requires external memory, which degrades the …
applications, but their implementation usually requires external memory, which degrades the …
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models
This paper presents a new algorithm-hardware co-optimization approach that maximizes
memory bandwidth utilization even for the pruned deep neural network (DNN) models …
memory bandwidth utilization even for the pruned deep neural network (DNN) models …
A 0.13 mJ/Prediction CIFAR-100 Raster-Scan-Based Wired-Logic Processor Using Non-Linear Neural Network
D Li, YC Hsu, R Sumikawa, A Kosuge… - … on Circuits and …, 2023 - ieeexplore.ieee.org
A 0.13 mJ/prediction with 68.6% accuracy single-chip wired-logic artificial intelligence (AI)
processor is developed in a 16nm field-programmable gate array (FPGA). Compared with …
processor is developed in a 16nm field-programmable gate array (FPGA). Compared with …
AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture
HJ Kang - arXiv preprint arXiv:2212.11438, 2022 - arxiv.org
Convolutional neural network (CNN) accelerators are being widely used for their efficiency,
but they require a large amount of memory, leading to the use of a slow and power …
but they require a large amount of memory, leading to the use of a slow and power …
Fpga-based uav heterogeneous computing platform architecture
J Guo, B Jiang, H Xu - 2021 6th International Conference on …, 2021 - ieeexplore.ieee.org
The development of the UAV field is changing with each passing day, especially a lot of
research has been done on the road of autonomous and controllable. At present, the control …
research has been done on the road of autonomous and controllable. At present, the control …
A 0.13 mJ/Prediction CIFAR-100 Fully Synthesizable Raster-Scan-Based Wired-Logic Processor in 16-nm FPGA
D Li, Z Zhan, R Sumikawa, M Hamada… - IEICE Transactions …, 2024 - search.ieice.org
A 0.13 mJ/prediction with 68.6% accuracy wired-logic deep neural network (DNN) processor
is developed in a single 16-nm field-programmable gate array (FPGA) chip. Compared with …
is developed in a single 16-nm field-programmable gate array (FPGA) chip. Compared with …