Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models

A Testolin - Applied Sciences, 2024 - mdpi.com
Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …

Instructabsa: Instruction learning for aspect based sentiment analysis

K Scaria, H Gupta, S Goyal, SA Sawant… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment
Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples …

LogicBench: Towards systematic evaluation of logical reasoning ability of large language models

M Parmar, N Patel, N Varshney… - Proceedings of the …, 2024 - aclanthology.org
Recently developed large language models (LLMs) have been shown to perform
remarkably well on a wide range of language understanding tasks. But, can they really …

Stable lm 2 1.6 b technical report

M Bellagente, J Tow, D Mahan, D Phung… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce StableLM 2 1.6 B, the first in a new generation of our language model series.
In this technical report, we present in detail the data and training procedure leading to the …

Targen: Targeted data generation with large language models

H Gupta, K Scaria, U Anantheswaran, S Verma… - arXiv preprint arXiv …, 2023 - arxiv.org
The rapid advancement of large language models (LLMs) has sparked interest in data
synthesis techniques, aiming to generate diverse and high-quality synthetic datasets …

[HTML][HTML] From GPT-3.5 to GPT-4. o: A Leap in AI's Medical Exam Performance

M Kipp - Information, 2024 - mdpi.com
ChatGPT is a large language model trained on increasingly large datasets to perform
diverse language-based tasks. It is capable of answering multiple-choice questions, such as …

Can NLP Models' Identify','Distinguish', and'Justify'Questions that Don't have a Definitive Answer?

A Agarwal, N Patel, N Varshney, M Parmar… - arXiv preprint arXiv …, 2023 - arxiv.org
Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a
variety of language understanding tasks, they primarily focus on questions that have a …

Context-ner: Contextual phrase generation at scale

H Gupta, S Verma, S Mashetty, S Mishra - arXiv preprint arXiv:2109.08079, 2021 - arxiv.org
Named Entity Recognition (NER) has seen significant progress in recent years, with
numerous state-of-the-art (SOTA) models achieving high performance. However, very few …

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

N Varshney, M Parmar, N Patel, D Handa… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-training on large corpora of text enables the language models to acquire a vast amount
of factual and commonsense knowledge which allows them to achieve remarkable …