Integration of cognitive tasks into artificial general intelligence test for large models

Y Qu, C Wei, P Du, W Che, C Zhang, W Ouyang, Y Bian… - Iscience, 2024 - cell.com
During the evolution of large models, performance evaluation is necessary for assessing
their capabilities. However, current model evaluations mainly rely on specific tasks and …

Measuring Social Norms of Large Language Models

Y Yuan, K Tang, J Shen, M Zhang, C Wang - arXiv preprint arXiv …, 2024 - arxiv.org
We present a new challenge to examine whether large language models understand social
norms. In contrast to existing datasets, our dataset requires a fundamental understanding of …

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

C Wang, R Jia, X Liu, D Song - arXiv preprint arXiv:2403.10499, 2024 - arxiv.org
Pre-training image representations from the raw text about images enables zero-shot vision
transfer to downstream tasks. Through pre-training on millions of samples collected from the …

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

E Pasewark, K Montgomery, K Duan, D Song… - arXiv preprint arXiv …, 2024 - arxiv.org
We present a new method for large language models to solve compositional tasks. Although
they have shown strong performance on traditional language understanding tasks, large …

A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

Y Yuan, C Liu, J Yuan, G Sun, S Li, M Zhang - arXiv preprint arXiv …, 2024 - arxiv.org
Retrieval-augmented generation (RAG) is a framework enabling large language models
(LLMs) to enhance their accuracy and reduce hallucinations by integrating external …

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

H Chae, S Yoon, CY Chun, G Go, Y Cho, G Lee… - The 4th Workshop on … - openreview.net
Recent Vision Language Models (VLMs) have demonstrated impressive multimodal
comprehension and reasoning capabilities, but they often struggle with trivially simple visual …