Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

An overview of energy-efficient hardware accelerators for on-device deep-neural-network training

J Lee, HJ Yoo - IEEE Open Journal of the Solid-State Circuits …, 2021 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have been widely used in various artificial intelligence (AI)
applications due to their overwhelming performance. Furthermore, recently, several …

EXACT: Scalable graph neural networks training via extreme activation compression

Z Liu, K Zhou, F Yang, L Li, R Chen… - … Conference on Learning …, 2021 - openreview.net
Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …

Rep-net: Efficient on-device learning via feature reprogramming

L Yang, AS Rakin, D Fan - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Transfer learning, where the goal is to transfer the well-trained deep learning models from a
primary source task to a new task, is a crucial learning scheme for on-device machine …

Back razor: Memory-efficient transfer learning by self-sparsified backpropagation

Z Jiang, X Chen, X Huang, X Du… - Advances in neural …, 2022 - proceedings.neurips.cc
Transfer learning from the model trained on large datasets to customized downstream tasks
has been widely used as the pre-trained model can greatly boost the generalizability …

Ac-gc: Lossy activation compression with guaranteed convergence

RD Evans, T Aamodt - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Parallel hardware devices (eg, graphics processor units) have limited high-bandwidth
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …

Comet: a novel memory-efficient deep learning training framework by using error-bounded lossy compression

S Jin, C Zhang, X Jiang, Y Feng, H Guan, G Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Training wide and deep neural networks (DNNs) require large amounts of storage resources
such as memory because the intermediate activation data must be saved in the memory …

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …