Sparsity in deep learning: Pruning and growth for efficient inference and training in neural...

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：99 相关文章所有 3 个版本

[PDF] arxiv.org

Structured pruning for deep convolutional neural networks: A survey

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org

The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

被引用次数：56 相关文章所有 7 个版本

[PDF] mlr.press

Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press

We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

被引用次数：245 相关文章所有 8 个版本

[PDF] mlr.press

Flexgen: High-throughput generative inference of large language models with a single gpu

Y Sheng, L Zheng, B Yuan, Z Li… - International …, 2023 - proceedings.mlr.press

The high computational and memory requirements of large language model (LLM) inference
make it feasible only with multiple high-end accelerators. Motivated by the emerging …

被引用次数：174 相关文章所有 10 个版本

[PDF] arxiv.org

Gptq: Accurate post-training quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, D Alistarh - arXiv preprint arXiv …, 2022 - arxiv.org

Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

被引用次数：365 相关文章所有 19 个版本

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：123 相关文章所有 7 个版本

[PDF] arxiv.org

A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arXiv preprint arXiv:2306.11695, 2023 - arxiv.org

As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

被引用次数：184 相关文章所有 5 个版本

[PDF] neurips.cc

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

被引用次数：92 相关文章所有 7 个版本

[PDF] neurips.cc

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

被引用次数：117 相关文章所有 5 个版本

[PDF] neurips.cc

Ties-merging: Resolving interference when merging models

P Yadav, D Tam, L Choshen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Transfer learning–ie, further fine-tuning a pre-trained model on a downstream task–can
confer significant advantages, including improved downstream performance, faster …

被引用次数：74 相关文章所有 7 个版本