MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning
J Zhang, Y Zhao, D Chen, X Tian, H Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …