Large language models are not fair evaluators
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large
language models~(LLMs), eg, GPT-4, as a referee to score and compare the quality of …
language models~(LLMs), eg, GPT-4, as a referee to score and compare the quality of …
Recent advances in natural language inference: A survey of benchmarks, resources, and approaches
In the NLP community, recent years have seen a surge of research activities that address
machines' ability to perform deep language understanding which goes beyond what is …
machines' ability to perform deep language understanding which goes beyond what is …
Calibrate before use: Improving few-shot performance of language models
GPT-3 can perform numerous tasks when provided a natural language prompt that contains
a few training examples. We show that this type of few-shot learning can be unstable: the …
a few training examples. We show that this type of few-shot learning can be unstable: the …
Symbolic knowledge distillation: from general language models to commonsense models
The common practice for training commonsense models has gone from-human-to-corpus-to-
machine: humans author commonsense knowledge graphs in order to train commonsense …
machine: humans author commonsense knowledge graphs in order to train commonsense …
Automatic story generation: Challenges and attempts
The scope of this survey paper is to explore the challenges in automatic story generation.
We hope to contribute in the following ways: 1. Explore how previous research in story …
We hope to contribute in the following ways: 1. Explore how previous research in story …
Commonsenseqa: A question answering challenge targeting commonsense knowledge
When answering a question, people often draw upon their rich world knowledge in addition
to the particular context. Recent work has focused primarily on answering questions given …
to the particular context. Recent work has focused primarily on answering questions given …
From recognition to cognition: Visual commonsense reasoning
Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …
Don't take the easy way out: Ensemble based methods for avoiding known dataset biases
State-of-the-art models often make use of superficial patterns in the data that do not
generalize well to out-of-domain or adversarial settings. For example, textual entailment …
generalize well to out-of-domain or adversarial settings. For example, textual entailment …
Swag: A large-scale adversarial dataset for grounded commonsense inference
Given a partial description like" she opened the hood of the car," humans can reason about
the situation and anticipate what might come next (" then, she examined the engine"). In this …
the situation and anticipate what might come next (" then, she examined the engine"). In this …
Annotation artifacts in natural language inference data
Large-scale datasets for natural language inference are created by presenting crowd
workers with a sentence (premise), and asking them to generate three new sentences …
workers with a sentence (premise), and asking them to generate three new sentences …