Gpt-zip: Deep compression of finetuned large language models- 学术资源搜索

Gpt-zip: Deep compression of finetuned large language models

B Isik, H Kumbong, W Ning, X Yao… - Workshop on Efficient …, 2023 - openreview.net

B Isik, H Kumbong, W Ning, X Yao, S Koyejo, C Zhang

Workshop on Efficient Systems for Foundation Models@ ICML2023, 2023•openreview.net

Storage is increasingly a practical bottleneck to scaling large language model (LLM) systems with personalization, co-location, and other use cases that require storing the pretrained base model plus multiple finetuned models. To this end, we propose GPT-Zip for post-finetuning compression. GPT-Zip uses quantization and sparsification to efficiently compress finetuned models by exploiting their closeness to the pretrained base model. Specifically, we demonstrate that the \emph{difference} between the finetuned models and the pretrained base model can efficiently be quantized into bits and pruned with sparsity together -- providing up to times overall size reduction. Thus, GPT-Zip avoids the linear growth in memory costs required for naive storage. We show that this compression can be achieved without performance degradation, as measured by evaluations on several tasks from the Natural Instructions dataset. Surprisingly, GPT-Zip sometimes improves accuracy over uncompressed models. We demonstrate the efficacy of GPT-Zip on four finetuned OPT-1.3B models and show that GPT-Zip reduces the storage cost by times more than existing LLM compression techniques while attaining significantly better performance.

openreview.net

展开收起

被引用次数：9 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果