ConjNLI: Natural language inference over conjunctive sentences

F Yu, H Zhang, P Tiwari, B Wang - ACM Computing Surveys, 2023 - dl.acm.org

This survey paper proposes a clearer view of natural language reasoning in the field of
Natural Language Processing (NLP), both conceptually and practically. Conceptually, we …

被引用次数：37 相关文章所有 3 个版本

[PDF] arxiv.org

Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

被引用次数：59 相关文章所有 2 个版本

[PDF] arxiv.org

Evaluating the logical reasoning ability of chatgpt and gpt-4

H Liu, R Ning, Z Teng, J Liu, Q Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

Harnessing logical reasoning ability is a comprehensive natural language understanding
endeavor. With the release of Generative Pretrained Transformer 4 (GPT-4), highlighted as" …

被引用次数：168 相关文章所有 2 个版本

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

被引用次数：329 相关文章所有 9 个版本

[PDF] arxiv.org

Robustness gym: Unifying the NLP evaluation landscape

K Goel, N Rajani, J Vig, S Tan, J Wu, S Zheng… - arXiv preprint arXiv …, 2021 - arxiv.org

Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …

被引用次数：131 相关文章所有 4 个版本

[PDF] github.io

Logiqa 2.0—an improved dataset for logical reasoning in natural language understanding

H Liu, J Liu, L Cui, Z Teng, N Duan… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

NLP research on logical reasoning regains momentum with the recent releases of a handful
of datasets, notably LogiQA and Reclor. Logical reasoning is exploited in many probing …

被引用次数：27 相关文章所有 3 个版本

[PDF] mit.edu

Towards faithful model explanation in nlp: A survey

Q Lyu, M Apidianaki, C Callison-Burch - Computational Linguistics, 2024 - direct.mit.edu

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to
understand. This has given rise to numerous efforts towards model explainability in recent …

被引用次数：50 相关文章所有 4 个版本

[PDF] neurips.cc

Recursion in recursion: Two-level nested recursion for length generalization with scalability

J Ray Chowdhury, C Caragea - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence
composition according to a preset balanced binary tree structure. Thus, their non-linear …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

ANLIzing the adversarial natural language inference dataset

A Williams, T Thrush, D Kiela - arXiv preprint arXiv:2010.12729, 2020 - arxiv.org

We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-
scale human-and-model-in-the-loop natural language inference dataset collected over …

被引用次数：39 相关文章所有 8 个版本

[PDF] arxiv.org

When llms meet cunning questions: A fallacy understanding benchmark for large language models

Y Li, Q Zhou, Y Luo, S Ma, Y Li, HT Zheng, X Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, Large Language Models (LLMs) have made remarkable evolutions in language
understanding and generation. Following this, various benchmarks for measuring all kinds …

被引用次数：7 相关文章所有 2 个版本