Competency problems: On finding and removing artifacts in language data

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

被引用次数：196 相关文章所有 6 个版本

[PDF] pnas.org Full View

The debate over understanding in AI's large language models

M Mitchell, DC Krakauer - Proceedings of the National …, 2023 - National Acad Sciences

We survey a current, heated debate in the artificial intelligence (AI) research community on
whether large pretrained language models can be said to understand language—and the …

被引用次数：226 相关文章所有 9 个版本

[PDF] arxiv.org

Impact of pretraining term frequencies on few-shot reasoning

Y Razeghi, RL Logan IV, M Gardner… - arXiv preprint arXiv …, 2022 - arxiv.org

Pretrained Language Models (LMs) have demonstrated ability to perform numerical
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …

被引用次数：204 相关文章所有 7 个版本

[PDF] neurips.cc

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

被引用次数：49 相关文章所有 6 个版本

[PDF] arxiv.org

Wanli: Worker and ai collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arXiv preprint arXiv:2201.05955, 2022 - arxiv.org

A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

被引用次数：198 相关文章所有 4 个版本

[PDF] arxiv.org

Generating data to mitigate spurious correlations in natural language inference datasets

Y Wu, M Gardner, P Stenetorp, P Dasigi - arXiv preprint arXiv:2203.12942, 2022 - arxiv.org

Natural language processing models often exploit spurious correlations between task-
independent features and labels in datasets to perform well only within the distributions they …

被引用次数：64 相关文章所有 5 个版本

[PDF] springer.com

Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer

Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Tailor: Generating and perturbing text with semantic controls

A Ross, T Wu, H Peng, ME Peters… - arXiv preprint arXiv …, 2021 - arxiv.org

Controlled text perturbation is useful for evaluating and improving model generalizability.
However, current techniques rely on training a model for every target perturbation, which is …

被引用次数：77 相关文章所有 6 个版本

[PDF] arxiv.org

Changing the world by changing the data

A Rogers - arXiv preprint arXiv:2105.13947, 2021 - arxiv.org

NLP community is currently investing a lot more research and resources into development of
deep learning models than training data. While we have made a lot of progress, it is now …

被引用次数：82 相关文章所有 6 个版本

[PDF] acm.org

The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning

J Hullman, S Kapoor, P Nanayakkara… - Proceedings of the …, 2022 - dl.acm.org

Arguments that machine learning (ML) is facing a reproducibility and replication crisis
suggest that some published claims in research cannot be taken at face value. Concerns …

被引用次数：43 相关文章所有 12 个版本