Superglue: A stickier benchmark for general-purpose language understanding systems

N Muennighoff, N Tazi, L Magne, N Reimers - arXiv preprint arXiv …, 2022 - arxiv.org

Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

被引用次数：232 相关文章所有 4 个版本

[PDF] arxiv.org

Agieval: A human-centric benchmark for evaluating foundation models

W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the general abilities of foundation models to tackle human-level tasks is a vital
aspect of their development and application in the pursuit of Artificial General Intelligence …

被引用次数：199 相关文章所有 3 个版本

[PDF] thecvf.com

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

被引用次数：530 相关文章所有 6 个版本

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：143 相关文章所有 7 个版本

[PDF] arxiv.org

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing

P He, J Gao, W Chen - arXiv preprint arXiv:2111.09543, 2021 - arxiv.org

This paper presents a new pre-trained language model, DeBERTaV3, which improves the
original DeBERTa model by replacing mask language modeling (MLM) with replaced token …

被引用次数：691 相关文章所有 3 个版本

[PDF] mlr.press

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press

Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

被引用次数：89 相关文章所有 6 个版本

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

被引用次数：190 相关文章所有 8 个版本

[PDF] neurips.cc

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

被引用次数：84 相关文章所有 6 个版本

[PDF] arxiv.org

P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks

X Liu, K Ji, Y Fu, WL Tam, Z Du, Z Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

Prompt tuning, which only tunes continuous prompts with a frozen language model,
substantially reduces per-task storage and memory usage at training. However, in the …

被引用次数：936 相关文章所有 8 个版本

[PDF] arxiv.org

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：217 相关文章所有 3 个版本