The effect of different writing tasks on linguistic style: A case study of the ROC story cloze task

P Wang, L Li, L Chen, Z Cai, D Zhu, B Lin… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large
language models~(LLMs), eg, GPT-4, as a referee to score and compare the quality of …

被引用次数：249 相关文章所有 2 个版本

[PDF] arxiv.org

Recent advances in natural language inference: A survey of benchmarks, resources, and approaches

S Storks, Q Gao, JY Chai - arXiv preprint arXiv:1904.01172, 2019 - arxiv.org

In the NLP community, recent years have seen a surge of research activities that address
machines' ability to perform deep language understanding which goes beyond what is …

被引用次数：104 相关文章所有 4 个版本

[PDF] mlr.press

Calibrate before use: Improving few-shot performance of language models

Z Zhao, E Wallace, S Feng, D Klein… - … on machine learning, 2021 - proceedings.mlr.press

GPT-3 can perform numerous tasks when provided a natural language prompt that contains
a few training examples. We show that this type of few-shot learning can be unstable: the …

被引用次数：1057 相关文章所有 5 个版本

[PDF] arxiv.org

Symbolic knowledge distillation: from general language models to commonsense models

P West, C Bhagavatula, J Hessel, JD Hwang… - arXiv preprint arXiv …, 2021 - arxiv.org

The common practice for training commonsense models has gone from-human-to-corpus-to-
machine: humans author commonsense knowledge graphs in order to train commonsense …

被引用次数：261 相关文章所有 4 个版本

[PDF] arxiv.org

Automatic story generation: Challenges and attempts

A Alabdulkarim, S Li, X Peng - arXiv preprint arXiv:2102.12634, 2021 - arxiv.org

The scope of this survey paper is to explore the challenges in automatic story generation.
We hope to contribute in the following ways: 1. Explore how previous research in story …

被引用次数：51 相关文章所有 4 个版本

[PDF] arxiv.org

Commonsenseqa: A question answering challenge targeting commonsense knowledge

A Talmor, J Herzig, N Lourie, J Berant - arXiv preprint arXiv:1811.00937, 2018 - arxiv.org

When answering a question, people often draw upon their rich world knowledge in addition
to the particular context. Recent work has focused primarily on answering questions given …

被引用次数：1255 相关文章所有 6 个版本

[PDF] thecvf.com

From recognition to cognition: Visual commonsense reasoning

R Zellers, Y Bisk, A Farhadi… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …

被引用次数：909 相关文章所有 7 个版本

[PDF] arxiv.org

Don't take the easy way out: Ensemble based methods for avoiding known dataset biases

C Clark, M Yatskar, L Zettlemoyer - arXiv preprint arXiv:1909.03683, 2019 - arxiv.org

State-of-the-art models often make use of superficial patterns in the data that do not
generalize well to out-of-domain or adversarial settings. For example, textual entailment …

被引用次数：476 相关文章所有 3 个版本

[PDF] arxiv.org

Swag: A large-scale adversarial dataset for grounded commonsense inference

R Zellers, Y Bisk, R Schwartz, Y Choi - arXiv preprint arXiv:1808.05326, 2018 - arxiv.org

Given a partial description like" she opened the hood of the car," humans can reason about
the situation and anticipate what might come next (" then, she examined the engine"). In this …

被引用次数：800 相关文章所有 4 个版本

[PDF] arxiv.org

Annotation artifacts in natural language inference data

S Gururangan, S Swayamdipta, O Levy… - arXiv preprint arXiv …, 2018 - arxiv.org

Large-scale datasets for natural language inference are created by presenting crowd
workers with a sentence (premise), and asking them to generate three new sentences …

被引用次数：1212 相关文章所有 8 个版本