Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
On the loss of context-awareness in general instruction fine-tuning
Pretrained Large Language Models (LLMs) require post-training methods such as
supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following …
supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following …
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity
Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Recent breakthroughs in preference alignment have significantly improved Large Language
Models' ability to generate texts that align with human preferences and values. However …
Models' ability to generate texts that align with human preferences and values. However …
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human
Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of LLMs to …
Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of LLMs to …