Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Spatten: Efficient sparse attention architecture with cascade token and head pruning
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …
(NLP) applications, showing superior performance than convolutional and recurrent …
An overview of energy-efficient hardware accelerators for on-device deep-neural-network training
J Lee, HJ Yoo - IEEE Open Journal of the Solid-State Circuits …, 2021 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have been widely used in various artificial intelligence (AI)
applications due to their overwhelming performance. Furthermore, recently, several …
applications due to their overwhelming performance. Furthermore, recently, several …
EXACT: Scalable graph neural networks training via extreme activation compression
Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …
Rep-net: Efficient on-device learning via feature reprogramming
Transfer learning, where the goal is to transfer the well-trained deep learning models from a
primary source task to a new task, is a crucial learning scheme for on-device machine …
primary source task to a new task, is a crucial learning scheme for on-device machine …
Back razor: Memory-efficient transfer learning by self-sparsified backpropagation
Transfer learning from the model trained on large datasets to customized downstream tasks
has been widely used as the pre-trained model can greatly boost the generalizability …
has been widely used as the pre-trained model can greatly boost the generalizability …
Ac-gc: Lossy activation compression with guaranteed convergence
Parallel hardware devices (eg, graphics processor units) have limited high-bandwidth
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …
Comet: a novel memory-efficient deep learning training framework by using error-bounded lossy compression
Training wide and deep neural networks (DNNs) require large amounts of storage resources
such as memory because the intermediate activation data must be saved in the memory …
such as memory because the intermediate activation data must be saved in the memory …
Fine-tuning language models over slow networks using activation quantization with guarantees
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
Recent developments in low-power AI accelerators: A survey
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …
Moore's law, new avenues and novel ideas in architecture design are being created and …