Gpt-zip: Deep compression of finetuned large language models
Workshop on Efficient Systems for Foundation Models@ ICML2023, 2023•openreview.net
Storage is increasingly a practical bottleneck to scaling large language model (LLM)
systems with personalization, co-location, and other use cases that require storing the
pretrained base model plus multiple finetuned models. To this end, we propose GPT-Zip for
post-finetuning compression. GPT-Zip uses quantization and sparsification to efficiently
compress finetuned models by exploiting their closeness to the pretrained base model.
Specifically, we demonstrate that the\emph {difference} between the finetuned models and …
systems with personalization, co-location, and other use cases that require storing the
pretrained base model plus multiple finetuned models. To this end, we propose GPT-Zip for
post-finetuning compression. GPT-Zip uses quantization and sparsification to efficiently
compress finetuned models by exploiting their closeness to the pretrained base model.
Specifically, we demonstrate that the\emph {difference} between the finetuned models and …
Storage is increasingly a practical bottleneck to scaling large language model (LLM) systems with personalization, co-location, and other use cases that require storing the pretrained base model plus multiple finetuned models. To this end, we propose GPT-Zip for post-finetuning compression. GPT-Zip uses quantization and sparsification to efficiently compress finetuned models by exploiting their closeness to the pretrained base model. Specifically, we demonstrate that the \emph{difference} between the finetuned models and the pretrained base model can efficiently be quantized into bits and pruned with sparsity together -- providing up to times overall size reduction. Thus, GPT-Zip avoids the linear growth in memory costs required for naive storage. We show that this compression can be achieved without performance degradation, as measured by evaluations on several tasks from the Natural Instructions dataset. Surprisingly, GPT-Zip sometimes improves accuracy over uncompressed models. We demonstrate the efficacy of GPT-Zip on four finetuned OPT-1.3B models and show that GPT-Zip reduces the storage cost by times more than existing LLM compression techniques while attaining significantly better performance.
openreview.net
以上显示的是最相近的搜索结果。 查看全部搜索结果