Generating training data with language models: Towards zero-shot language understanding

Y Meng, J Huang, Y Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Pretrained language models (PLMs) have demonstrated remarkable performance in various
natural language processing tasks: Unidirectional PLMs (eg, GPT) are well known for their …

Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification

S Hu, N Ding, H Wang, Z Liu, J Wang, J Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Tuning pre-trained language models (PLMs) with task-specific prompts has been a
promising approach for text classification. Particularly, previous studies suggest that prompt …

Measuring coding challenge competence with apps

D Hendrycks, S Basart, S Kadavath, M Mazeika… - arXiv preprint arXiv …, 2021 - arxiv.org
While programming is one of the most broadly applicable skills in modern society, modern
machine learning models still cannot code solutions to basic problems. Despite its …

TimeLMs: Diachronic language models from Twitter

D Loureiro, F Barbieri, L Neves, LE Anke… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite its importance, the time variable has been largely neglected in the NLP and
language model literature. In this paper, we present TimeLMs, a set of language models …

Codegen2: Lessons for training llms on programming and natural languages

E Nijkamp, H Hayashi, C Xiong, S Savarese… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable abilities in representation
learning for program synthesis and understanding tasks. The quality of the learned …

D4: Improving llm pretraining via document de-duplication and diversification

K Tirumala, D Simig, A Aghajanyan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Over recent years, an increasing amount of compute and data has been poured into training
large language models (LLMs), usually by doing one-pass learning on as many tokens as …

Tweeteval: Unified benchmark and comparative evaluation for tweet classification

F Barbieri, J Camacho-Collados, L Neves… - arXiv preprint arXiv …, 2020 - arxiv.org
The experimental landscape in natural language processing for social media is too
fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics …

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

J Dodge, M Sap, A Marasović, W Agnew… - arXiv preprint arXiv …, 2021 - arxiv.org
Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …

Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …