Superglue: A stickier benchmark for general-purpose language understanding systems

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

被引用次数：252 相关文章所有 9 个版本

[PDF] arxiv.org

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

被引用次数：384 相关文章所有 5 个版本

[PDF] openreview.net

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

被引用次数：322 相关文章所有 3 个版本

[PDF] jmlr.org

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org

Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

被引用次数：4152 相关文章所有 10 个版本

[PDF] arxiv.org

Large language models can self-improve

J Huang, SS Gu, L Hou, Y Wu, X Wang, H Yu… - arXiv preprint arXiv …, 2022 - arxiv.org

Large Language Models (LLMs) have achieved excellent performances in various tasks.
However, fine-tuning an LLM requires extensive supervision. Human, on the other hand …

被引用次数：329 相关文章所有 9 个版本

[PDF] arxiv.org

Visual prompt tuning

M Jia, L Tang, BC Chen, C Cardie, S Belongie… - … on Computer Vision, 2022 - Springer

The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …

被引用次数：1192 相关文章所有 7 个版本

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：125 相关文章所有 7 个版本

[PDF] pnas.org Full View

The debate over understanding in AI's large language models

M Mitchell, DC Krakauer - Proceedings of the National …, 2023 - National Acad Sciences

We survey a current, heated debate in the artificial intelligence (AI) research community on
whether large pretrained language models can be said to understand language—and the …

被引用次数：199 相关文章所有 9 个版本

[PDF] arxiv.org

Rethinking the role of demonstrations: What makes in-context learning work?

S Min, X Lyu, A Holtzman, M Artetxe, M Lewis… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LMs) are able to in-context learn--perform a new task via inference
alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …

被引用次数：941 相关文章所有 6 个版本

[PDF] arxiv.org

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：197 相关文章所有 7 个版本