Measuring vision-language stem skills of neural models

Y Qu, C Wei, P Du, W Che, C Zhang, W Ouyang, Y Bian… - Iscience, 2024 - cell.com

During the evolution of large models, performance evaluation is necessary for assessing
their capabilities. However, current model evaluations mainly rely on specific tasks and …

被引用次数：3 相关文章所有 9 个版本

[PDF] arxiv.org

Measuring Social Norms of Large Language Models

Y Yuan, K Tang, J Shen, M Zhang, C Wang - arXiv preprint arXiv …, 2024 - arxiv.org

We present a new challenge to examine whether large language models understand social
norms. In contrast to existing datasets, our dataset requires a fundamental understanding of …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

C Wang, R Jia, X Liu, D Song - arXiv preprint arXiv:2403.10499, 2024 - arxiv.org

Pre-training image representations from the raw text about images enables zero-shot vision
transfer to downstream tasks. Through pre-training on millions of samples collected from the …

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

E Pasewark, K Montgomery, K Duan, D Song… - arXiv preprint arXiv …, 2024 - arxiv.org

We present a new method for large language models to solve compositional tasks. Although
they have shown strong performance on traditional language understanding tasks, large …

A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

Y Yuan, C Liu, J Yuan, G Sun, S Li, M Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Retrieval-augmented generation (RAG) is a framework enabling large language models
(LLMs) to enhance their accuracy and reduce hallucinations by integrating external …

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

H Chae, S Yoon, CY Chun, G Go, Y Cho, G Lee… - The 4th Workshop on … - openreview.net

Recent Vision Language Models (VLMs) have demonstrated impressive multimodal
comprehension and reasoning capabilities, but they often struggle with trivially simple visual …