MTEB: Massive text embedding benchmark

N Muennighoff, N Tazi, L Magne, N Reimers - arXiv preprint arXiv …, 2022 - arxiv.org
Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

Agieval: A human-centric benchmark for evaluating foundation models

W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the general abilities of foundation models to tackle human-level tasks is a vital
aspect of their development and application in the pursuit of Artificial General Intelligence …

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing

P He, J Gao, W Chen - arXiv preprint arXiv:2111.09543, 2021 - arxiv.org
This paper presents a new pre-trained language model, DeBERTaV3, which improves the
original DeBERTa model by replacing mask language modeling (MLM) with replaced token …

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks

X Liu, K Ji, Y Fu, WL Tam, Z Du, Z Yang… - arXiv preprint arXiv …, 2021 - arxiv.org
Prompt tuning, which only tunes continuous prompts with a frozen language model,
substantially reduces per-task storage and memory usage at training. However, in the …

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …