A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
Some things are more cringe than others: Preference optimization with the pairwise cringe loss
Practitioners commonly align large language models using pairwise preferences, ie, given
labels of the type response A is preferred to response B for a given input. Perhaps less …
labels of the type response A is preferred to response B for a given input. Perhaps less …
Gibbs sampling from human feedback: A provable kl-constrained framework for rlhf
This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …
Towards analyzing and understanding the limitations of dpo: A theoretical perspective
Direct Preference Optimization (DPO), which derives reward signals directly from pairwise
preference data, has shown its effectiveness on aligning Large Language Models (LLMs) …
preference data, has shown its effectiveness on aligning Large Language Models (LLMs) …
Heterogeneous Contrastive Learning for Foundation Models and Beyond
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive
self-supervised learning to model large-scale heterogeneous data. Many existing foundation …
self-supervised learning to model large-scale heterogeneous data. Many existing foundation …
Are U a Joke Master? Pun Generation via Multi-Stage Curriculum Learning towards a Humor LLM
Y Chen, C Yang, T Hu, X Chen, M Lan… - Findings of the …, 2024 - aclanthology.org
Although large language models (LLMs) acquire extensive world knowledge and some
reasoning abilities, their proficiency in generating humorous sentences remains a …
reasoning abilities, their proficiency in generating humorous sentences remains a …
REAL: Response Embedding-based Alignment for LLMs
Aligning large language models (LLMs) to human preferences is a crucial step in building
helpful and safe AI tools, which usually involve training on supervised datasets. Popular …
helpful and safe AI tools, which usually involve training on supervised datasets. Popular …
Aligning Large Language Models with Counterfactual DPO
B Butcher - arXiv preprint arXiv:2401.09566, 2024 - arxiv.org
Advancements in large language models (LLMs) have demonstrated remarkable
capabilities across a diverse range of applications. These models excel in generating text …
capabilities across a diverse range of applications. These models excel in generating text …