Benchmarks for automated commonsense reasoning: A survey
E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models
A Testolin - Applied Sciences, 2024 - mdpi.com
Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …
Instructabsa: Instruction learning for aspect based sentiment analysis
We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment
Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples …
Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples …
LogicBench: Towards systematic evaluation of logical reasoning ability of large language models
Recently developed large language models (LLMs) have been shown to perform
remarkably well on a wide range of language understanding tasks. But, can they really …
remarkably well on a wide range of language understanding tasks. But, can they really …
Stable lm 2 1.6 b technical report
M Bellagente, J Tow, D Mahan, D Phung… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce StableLM 2 1.6 B, the first in a new generation of our language model series.
In this technical report, we present in detail the data and training procedure leading to the …
In this technical report, we present in detail the data and training procedure leading to the …
Targen: Targeted data generation with large language models
The rapid advancement of large language models (LLMs) has sparked interest in data
synthesis techniques, aiming to generate diverse and high-quality synthetic datasets …
synthesis techniques, aiming to generate diverse and high-quality synthetic datasets …
[HTML][HTML] From GPT-3.5 to GPT-4. o: A Leap in AI's Medical Exam Performance
M Kipp - Information, 2024 - mdpi.com
ChatGPT is a large language model trained on increasingly large datasets to perform
diverse language-based tasks. It is capable of answering multiple-choice questions, such as …
diverse language-based tasks. It is capable of answering multiple-choice questions, such as …
Can NLP Models' Identify','Distinguish', and'Justify'Questions that Don't have a Definitive Answer?
Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a
variety of language understanding tasks, they primarily focus on questions that have a …
variety of language understanding tasks, they primarily focus on questions that have a …
Context-ner: Contextual phrase generation at scale
Named Entity Recognition (NER) has seen significant progress in recent years, with
numerous state-of-the-art (SOTA) models achieving high performance. However, very few …
numerous state-of-the-art (SOTA) models achieving high performance. However, very few …
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Pre-training on large corpora of text enables the language models to acquire a vast amount
of factual and commonsense knowledge which allows them to achieve remarkable …
of factual and commonsense knowledge which allows them to achieve remarkable …