Ac-gc: Lossy activation compression with guaranteed convergence

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：41 相关文章所有 2 个版本

[PDF] openreview.net

EXACT: Scalable graph neural networks training via extreme activation compression

Z Liu, K Zhou, F Yang, L Li, R Chen… - … Conference on Learning …, 2021 - openreview.net

Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …

被引用次数：58 相关文章

[PDF] github.io

[PDF][PDF] Tinytrain: Deep neural network training at the extreme edge

YD Kwon, R Li, SI Venieris… - arXiv preprint arXiv …, 2023 - theyoungkwon.github.io

On-device training is essential for user personalisation and privacy. With the pervasiveness
of IoT devices and microcontroller units (MCU), this task becomes more challenging due to …

被引用次数：13 相关文章所有 3 个版本

[PDF] neurips.cc

Back razor: Memory-efficient transfer learning by self-sparsified backpropagation

Z Jiang, X Chen, X Huang, X Du… - Advances in neural …, 2022 - proceedings.neurips.cc

Transfer learning from the model trained on large datasets to customized downstream tasks
has been widely used as the pre-trained model can greatly boost the generalizability …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Tinykg: Memory-efficient training framework for knowledge graph neural recommender systems

H Chen, X Li, K Zhou, X Hu, CCM Yeh… - Proceedings of the 16th …, 2022 - dl.acm.org

There has been an explosion of interest in designing various Knowledge Graph Neural
Networks (KGNNs), which achieve state-of-the-art performance and provide great …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

TANGO: re-thinking quantization for graph neural network training on GPUs

S Chen, D Zheng, C Ding, C Huan, Y Ji… - Proceedings of the …, 2023 - dl.acm.org

Graph learning is becoming increasingly popular due to its superior performance in tackling
many grand challenges. While quantization is widely used to accelerate Graph Neural …

被引用次数：3 相关文章所有 10 个版本

[PDF] neurips.cc

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc

Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

被引用次数：8 相关文章所有 5 个版本

ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024 - dl.acm.org

In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …

被引用次数：6 相关文章

[PDF] mlr.press

DIVISION: memory efficient training via dual activation precision

G Wang, Z Liu, Z Jiang, N Liu… - … on Machine Learning, 2023 - proceedings.mlr.press

Activation compressed training provides a solution towards reducing the memory cost of
training deep neural networks (DNNs). However, state-of-the-art work combines a search of …

被引用次数：4 相关文章所有 7 个版本