Distillm: Towards streamlined distillation for large language models

J Ko, S Kim, T Chen, SY Yun - arXiv preprint arXiv:2402.03898, 2024 - arxiv.org
Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller
student model, reducing its inference cost and memory footprint while preserving model …

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

H Sun, Y Zhuang, W Wei, C Zhang, B Dai - arXiv preprint arXiv …, 2024 - arxiv.org
Adapting state-of-the-art Large Language Models (LLMs) like GPT-4 and Gemini for specific
tasks is challenging. Due to the opacity in their parameters, embeddings, and even output …

Efficient fine-tuning large language models for knowledge-aware response planning

M Nguyen, KC Kishan, T Nguyen, A Chadha… - … European Conference on …, 2023 - Springer
Abstract Large Language Models (LLMs) have shown impressive emergent language
capabilities, especially in applications with high ambiguity, such as language reasoning and …

CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare

J Zhu, M Tan, M Yang, R Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid progress in Large Language Models (LLMs) has prompted the creation of
numerous benchmarks to evaluate their capabilities. This study focuses on the …