Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …
there has been much work on benchmark datasets needed to track modeling progress …
The debate over understanding in AI's large language models
M Mitchell, DC Krakauer - Proceedings of the National …, 2023 - National Acad Sciences
We survey a current, heated debate in the artificial intelligence (AI) research community on
whether large pretrained language models can be said to understand language—and the …
whether large pretrained language models can be said to understand language—and the …
Impact of pretraining term frequencies on few-shot reasoning
Pretrained Language Models (LMs) have demonstrated ability to perform numerical
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …
Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …
Wanli: Worker and ai collaboration for natural language inference dataset creation
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
Generating data to mitigate spurious correlations in natural language inference datasets
Natural language processing models often exploit spurious correlations between task-
independent features and labels in datasets to perform well only within the distributions they …
independent features and labels in datasets to perform well only within the distributions they …
Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models
Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …
Tailor: Generating and perturbing text with semantic controls
Controlled text perturbation is useful for evaluating and improving model generalizability.
However, current techniques rely on training a model for every target perturbation, which is …
However, current techniques rely on training a model for every target perturbation, which is …
Changing the world by changing the data
A Rogers - arXiv preprint arXiv:2105.13947, 2021 - arxiv.org
NLP community is currently investing a lot more research and resources into development of
deep learning models than training data. While we have made a lot of progress, it is now …
deep learning models than training data. While we have made a lot of progress, it is now …
The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning
Arguments that machine learning (ML) is facing a reproducibility and replication crisis
suggest that some published claims in research cannot be taken at face value. Concerns …
suggest that some published claims in research cannot be taken at face value. Concerns …