" John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility

E Davis - ACM Computing Surveys, 2023 - dl.acm.org

More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

被引用次数：53 相关文章所有 4 个版本

[PDF] mdpi.com

Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models

A Testolin - Applied Sciences, 2024 - mdpi.com

Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Instructabsa: Instruction learning for aspect based sentiment analysis

K Scaria, H Gupta, S Goyal, SA Sawant… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment
Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples …

被引用次数：44 相关文章所有 3 个版本

[PDF] aclanthology.org

LogicBench: Towards systematic evaluation of logical reasoning ability of large language models

M Parmar, N Patel, N Varshney… - Proceedings of the …, 2024 - aclanthology.org

Recently developed large language models (LLMs) have been shown to perform
remarkably well on a wide range of language understanding tasks. But, can they really …

被引用次数：15 相关文章

[PDF] arxiv.org

Stable lm 2 1.6 b technical report

M Bellagente, J Tow, D Mahan, D Phung… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce StableLM 2 1.6 B, the first in a new generation of our language model series.
In this technical report, we present in detail the data and training procedure leading to the …

被引用次数：43 相关文章所有 2 个版本

[PDF] arxiv.org

Targen: Targeted data generation with large language models

H Gupta, K Scaria, U Anantheswaran, S Verma… - arXiv preprint arXiv …, 2023 - arxiv.org

The rapid advancement of large language models (LLMs) has sparked interest in data
synthesis techniques, aiming to generate diverse and high-quality synthetic datasets …

被引用次数：17 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] From GPT-3.5 to GPT-4. o: A Leap in AI's Medical Exam Performance

M Kipp - Information, 2024 - mdpi.com

ChatGPT is a large language model trained on increasingly large datasets to perform
diverse language-based tasks. It is capable of answering multiple-choice questions, such as …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Can NLP Models' Identify','Distinguish', and'Justify'Questions that Don't have a Definitive Answer?

A Agarwal, N Patel, N Varshney, M Parmar… - arXiv preprint arXiv …, 2023 - arxiv.org

Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a
variety of language understanding tasks, they primarily focus on questions that have a …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Context-ner: Contextual phrase generation at scale

H Gupta, S Verma, S Mashetty, S Mishra - arXiv preprint arXiv:2109.08079, 2021 - arxiv.org

Named Entity Recognition (NER) has seen significant progress in recent years, with
numerous state-of-the-art (SOTA) models achieving high performance. However, very few …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

N Varshney, M Parmar, N Patel, D Handa… - arXiv preprint arXiv …, 2023 - arxiv.org

Pre-training on large corpora of text enables the language models to acquire a vast amount
of factual and commonsense knowledge which allows them to achieve remarkable …

被引用次数：4 相关文章所有 2 个版本