Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Large language models: A survey

S Minaee, T Mikolov, N Nikzad, M Chenaghlu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

Ufo: A ui-focused agent for windows os interaction

C Zhang, L Li, S He, X Zhang, B Qiao, S Qin… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to
applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arXiv preprint arXiv …, 2024 - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Foundation models for recommender systems: A survey and new perspectives

C Huang, T Yu, K Xie, S Zhang, L Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Foundation Models (FMs), with their extensive knowledge bases and complex
architectures, have offered unique opportunities within the realm of recommender systems …

An interactive agent foundation model

Z Durante, B Sarkar, R Gong, R Taori, Y Noda… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of artificial intelligence systems is transitioning from creating static, task-
specific models to dynamic, agent-based systems capable of performing well in a wide …

VistaRAG: Toward Safe and Trustworthy Autonomous Driving Through Retrieval-Augmented Generation

X Dai, C Guo, Y Tang, H Li, Y Wang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Autonomous driving based on foundation models has recently garnered widespread
attention. However, the risk of hallucinations inherent in foundation models could …

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv

DM Park, HJ Lee - Informatization Policy, 2024 - koreascience.kr
Hallucination is a significant barrier to the utilization of large-scale language models or
multimodal models. In this study, we collected 654 computer science papers with" …

Knowagent: Knowledge-augmented planning for llm-based agents

Y Zhu, S Qiao, Y Ou, S Deng, N Zhang, S Lyu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated great potential in complex reasoning
tasks, yet they fall short when tackling more sophisticated challenges, especially when …