MTEB: Massive text embedding benchmark
Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …
covering their possible applications to other tasks. It is unclear whether state-of-the-art …
Agieval: A human-centric benchmark for evaluating foundation models
Evaluating the general abilities of foundation models to tackle human-level tasks is a vital
aspect of their development and application in the pursuit of Artificial General Intelligence …
aspect of their development and application in the pursuit of Artificial General Intelligence …
Flava: A foundational language and vision alignment model
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing
This paper presents a new pre-trained language model, DeBERTaV3, which improves the
original DeBERTa model by replacing mask language modeling (MLM) with replaced token …
original DeBERTa model by replacing mask language modeling (MLM) with replaced token …
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …
power-seeking and deception, analogous to how next-token prediction in language models …
[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
Fine-tuning language models with just forward passes
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks
Prompt tuning, which only tunes continuous prompts with a frozen language model,
substantially reduces per-task storage and memory usage at training. However, in the …
substantially reduces per-task storage and memory usage at training. However, in the …
A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …