Testability and dependability of AI hardware: Survey, trends, challenges, and perspectives

F Su, C Liu, HG Stratigopoulos - IEEE Design & Test, 2023 - ieeexplore.ieee.org
Hardware realization of artificial intelligence (AI) requires new design styles and even
underlying technologies than those used in traditional digital processors or logic circuits …

Dependable dnn accelerator for safety-critical systems: A review on the aging perspective

I Moghaddasi, S Gorgin, JA Lee - IEEE Access, 2023 - ieeexplore.ieee.org
In the modern era, artificial intelligence (AI) and deep learning (DL) seamlessly integrate into
various spheres of our daily lives. These cutting-edge disciplines have given rise to …

A low-cost fault corrector for deep neural networks through range restriction

Z Chen, G Li, K Pattabiraman - 2021 51st Annual IEEE/IFIP …, 2021 - ieeexplore.ieee.org
The adoption of deep neural networks (DNNs) in safety-critical domains has engendered
serious reliability concerns. A prominent example is hardware transient faults that are …

ByteTransformer: A high-performance transformer boosted for variable-length inputs

Y Zhai, C Jiang, L Wang, X Jia, S Zhang… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Transformers have become keystone models in natural language processing over the past
decade. They have achieved great popularity in deep learning applications, but the …

Understanding and mitigating hardware failures in deep learning training systems

Y He, M Hutton, S Chan, R De Gruijl… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep neural network (DNN) training workloads are increasingly susceptible to hardware
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …

Towards energy-efficient and secure edge AI: A cross-layer framework ICCAD special session paper

M Shafique, A Marchisio, RVW Putra… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
The security and privacy concerns along with the amount of data that is required to be
processed on regular basis has pushed processing to the edge of the computing systems …

Distserve: Disaggregating prefill and decoding for goodput-optimized large language model serving

Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
DistServe improves the performance of large language models (LLMs) serving by
disaggregating the prefill and decoding computation. Existing LLM serving systems colocate …

Soft error tolerant convolutional neural networks on FPGAs with ensemble learning

Z Gao, H Zhang, Y Yao, J Xiao, S Zeng… - … Transactions on Very …, 2022 - ieeexplore.ieee.org
Convolutional neural networks (CNNs) are widely used in computer vision and natural
language processing. Field-programmable gate arrays (FPGAs) are popular accelerators for …

Improving fault tolerance for reliable DNN using boundary-aware activation

J Zhan, R Sun, W Jiang, Y Jiang, X Yin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this article, we approach to construct reliable deep neural networks (DNNs) for safety-
critical artificial intelligent applications. We propose to modify rectified linear unit (ReLU), a …

Exploring Winograd convolution for cost-effective neural network fault tolerance

X Xue, C Liu, B Liu, H Huang, Y Wang… - … Transactions on Very …, 2023 - ieeexplore.ieee.org
Winograd is generally utilized to optimize convolution performance and computational
efficiency because of the reduced multiplication operations, but the reliability issues brought …