Gact: Activation compressed training for generic network architectures

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：46 相关文章所有 7 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：32 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on efficient training of transformers

B Zhuang, J Liu, Z Pan, H He, Y Weng… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in Transformers have come with a huge requirement on computing
resources, highlighting the importance of developing efficient training techniques to make …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：25 相关文章所有 3 个版本

[PDF] neurips.cc

Memory efficient optimizers with 4-bit states

B Li, J Chen, J Zhu - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Optimizer states are a major source of memory consumption for training neural networks,
limiting the maximum trainable model within given memory budget. Compressing the …

被引用次数：8 相关文章所有 7 个版本

[PDF] neurips.cc

Winner-take-all column row sampling for memory efficient adaptation of language model

Z Liu, G Wang, SH Zhong, Z Xu, D Zha… - Advances in …, 2024 - proceedings.neurips.cc

As the model size grows rapidly, fine-tuning the large pre-trained language model has
become increasingly difficult due to its extensive memory usage. Previous works usually …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Tinytrain: Deep neural network training at the extreme edge

YD Kwon, R Li, SI Venieris, J Chauhan… - arXiv preprint arXiv …, 2023 - arxiv.org

On-device training is essential for user personalisation and privacy. With the pervasiveness
of IoT devices and microcontroller units (MCU), this task becomes more challenging due to …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Tinykg: Memory-efficient training framework for knowledge graph neural recommender systems

H Chen, X Li, K Zhou, X Hu, CCM Yeh… - Proceedings of the 16th …, 2022 - dl.acm.org

There has been an explosion of interest in designing various Knowledge Graph Neural
Networks (KGNNs), which achieve state-of-the-art performance and provide great …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

TANGO: re-thinking quantization for graph neural network training on GPUs

S Chen, D Zheng, C Ding, C Huan, Y Ji… - Proceedings of the …, 2023 - dl.acm.org

Graph learning is becoming increasingly popular due to its superior performance in tackling
many grand challenges. While quantization is widely used to accelerate Graph Neural …

被引用次数：3 相关文章所有 10 个版本

[PDF] mlr.press

DIVISION: memory efficient training via dual activation precision

G Wang, Z Liu, Z Jiang, N Liu… - … on Machine Learning, 2023 - proceedings.mlr.press

Activation compressed training provides a solution towards reducing the memory cost of
training deep neural networks (DNNs). However, state-of-the-art work combines a search of …

被引用次数：4 相关文章所有 7 个版本