Kocommongen v2: A benchmark for navigating korean commonsense reasoning challenges in large language models
The evolution of large language models (LLMs) has culminated in a multitask model
paradigm where prompts drive the generation of user-specific outputs. However, this …
paradigm where prompts drive the generation of user-specific outputs. However, this …
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
Recent advances in Large Language Models (LLMs) have highlighted the need for robust,
comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional …
comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional …
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have
shown significant multi-step reasoning capabilities in factual content like mathematics …
shown significant multi-step reasoning capabilities in factual content like mathematics …
CogLM: Tracking Cognitive Development of Large Language Models
Piaget's Theory of Cognitive Development (PTC) posits that the development of cognitive
levels forms the foundation for human learning across various abilities. As Large Language …
levels forms the foundation for human learning across various abilities. As Large Language …
Testing Memory Capabilities in Large Language Models with the Sequence Order Recall Task
Many benchmarks focus on evaluating Large Language Models (LLMs) on facts and
semantic relations, primarily assessing their semantic memory. However, some memories in …
semantic relations, primarily assessing their semantic memory. However, some memories in …