Integration of cognitive tasks into artificial general intelligence test for large models
During the evolution of large models, performance evaluation is necessary for assessing
their capabilities. However, current model evaluations mainly rely on specific tasks and …
their capabilities. However, current model evaluations mainly rely on specific tasks and …
Measuring Social Norms of Large Language Models
We present a new challenge to examine whether large language models understand social
norms. In contrast to existing datasets, our dataset requires a fundamental understanding of …
norms. In contrast to existing datasets, our dataset requires a fundamental understanding of …
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Pre-training image representations from the raw text about images enables zero-shot vision
transfer to downstream tasks. Through pre-training on millions of samples collected from the …
transfer to downstream tasks. Through pre-training on millions of samples collected from the …
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
We present a new method for large language models to solve compositional tasks. Although
they have shown strong performance on traditional language understanding tasks, large …
they have shown strong performance on traditional language understanding tasks, large …
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
Retrieval-augmented generation (RAG) is a framework enabling large language models
(LLMs) to enhance their accuracy and reduce hallucinations by integrating external …
(LLMs) to enhance their accuracy and reduce hallucinations by integrating external …
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
H Chae, S Yoon, CY Chun, G Go, Y Cho, G Lee… - The 4th Workshop on … - openreview.net
Recent Vision Language Models (VLMs) have demonstrated impressive multimodal
comprehension and reasoning capabilities, but they often struggle with trivially simple visual …
comprehension and reasoning capabilities, but they often struggle with trivially simple visual …