Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
arXiv preprint arXiv:2403.14783, 2024•arxiv.org
This work explores the zero-shot capabilities of foundation models in Visual Question
Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent
VQA, to overcome the limitations of foundation models in object detection and counting by
using specialized agents as tools. Unlike existing approaches, our study focuses on the
system's performance without fine-tuning it on specific VQA datasets, making it more
practical and robust in the open world. We present preliminary experimental results under …
Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent
VQA, to overcome the limitations of foundation models in object detection and counting by
using specialized agents as tools. Unlike existing approaches, our study focuses on the
system's performance without fine-tuning it on specific VQA datasets, making it more
practical and robust in the open world. We present preliminary experimental results under …
This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on specific VQA datasets, making it more practical and robust in the open world. We present preliminary experimental results under zero-shot scenarios and highlight some failure cases, offering new directions for future research.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果