GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition

在引用文章中搜索

[PDF] arxiv.org

Dettoolchain: A new prompting paradigm to unleash detection ability of mllm

Y Wu, Y Wang, S Tang, W Wu, T He, W Ouyang… - … on Computer Vision, 2025 - Springer

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object
detection ability of multimodal large language models (MLLMs), such as GPT-4V and …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Freeva: Offline mllm as training-free video assistant

W Wu - arXiv preprint arXiv:2405.07798, 2024 - arxiv.org

This paper undertakes an empirical study to revisit the latest advancements in Multimodal
Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

G Dai, J Zhao, Y Chen, Y Qin, H Zhao, G Xie… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a
target destination, has recently seen significant advancements. In contrast to navigation in …

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision

S Bansal, M Wray, D Damen - arXiv preprint arXiv:2404.09933, 2024 - arxiv.org

Large Vision Language Models (VLMs) are now the de facto state-of-the-art for a number of
tasks including visual question answering, recognising objects, and spatial referral. In this …

被引用次数：3 相关文章所有 2 个版本