Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

EXACT: Scalable graph neural networks training via extreme activation compression

Z Liu, K Zhou, F Yang, L Li, R Chen… - … Conference on Learning …, 2021 - openreview.net
Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …

[PDF][PDF] Tinytrain: Deep neural network training at the extreme edge

YD Kwon, R Li, SI Venieris… - arXiv preprint arXiv …, 2023 - theyoungkwon.github.io
On-device training is essential for user personalisation and privacy. With the pervasiveness
of IoT devices and microcontroller units (MCU), this task becomes more challenging due to …

Back razor: Memory-efficient transfer learning by self-sparsified backpropagation

Z Jiang, X Chen, X Huang, X Du… - Advances in neural …, 2022 - proceedings.neurips.cc
Transfer learning from the model trained on large datasets to customized downstream tasks
has been widely used as the pre-trained model can greatly boost the generalizability …

Tinykg: Memory-efficient training framework for knowledge graph neural recommender systems

H Chen, X Li, K Zhou, X Hu, CCM Yeh… - Proceedings of the 16th …, 2022 - dl.acm.org
There has been an explosion of interest in designing various Knowledge Graph Neural
Networks (KGNNs), which achieve state-of-the-art performance and provide great …

TANGO: re-thinking quantization for graph neural network training on GPUs

S Chen, D Zheng, C Ding, C Huan, Y Ji… - Proceedings of the …, 2023 - dl.acm.org
Graph learning is becoming increasingly popular due to its superior performance in tackling
many grand challenges. While quantization is widely used to accelerate Graph Neural …

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024 - dl.acm.org
In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …

DIVISION: memory efficient training via dual activation precision

G Wang, Z Liu, Z Jiang, N Liu… - … on Machine Learning, 2023 - proceedings.mlr.press
Activation compressed training provides a solution towards reducing the memory cost of
training deep neural networks (DNNs). However, state-of-the-art work combines a search of …