Reverie: Remote embodied visual referring expression in real indoor environments

J Fan, P Zheng, S Li - Robotics and Computer-Integrated Manufacturing, 2022 - Elsevier

Recently human–robot collaboration (HRC) has emerged as a promising paradigm for mass
personalization in manufacturing owing to the potential to fully exploit the strength of human …

被引用次数：132 相关文章所有 3 个版本

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：100 相关文章所有 5 个版本

[PDF] arxiv.org

How much can clip benefit vision-and-language tasks?

S Shen, LH Li, H Tan, M Bansal, A Rohrbach… - arXiv preprint arXiv …, 2021 - arxiv.org

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …

被引用次数：405 相关文章所有 3 个版本

[PDF] neurips.cc

History aware multimodal transformer for vision-and-language navigation

S Chen, PL Guhur, C Schmid… - Advances in neural …, 2021 - proceedings.neurips.cc

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …

被引用次数：201 相关文章所有 8 个版本

[PDF] thecvf.com

Think global, act local: Dual-scale graph transformer for vision-and-language navigation

S Chen, PL Guhur, M Tapaswi… - Proceedings of the …, 2022 - openaccess.thecvf.com

Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …

被引用次数：130 相关文章所有 9 个版本

[PDF] aaai.org

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

被引用次数：78 相关文章所有 4 个版本

[PDF] thecvf.com

Vln bert: A recurrent vision-and-language bert for navigation

Y Hong, Q Wu, Y Qi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …

被引用次数：258 相关文章所有 5 个版本

[PDF] arxiv.org

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arXiv preprint arXiv …, 2020 - arxiv.org

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

被引用次数：265 相关文章所有 6 个版本

[PDF] thecvf.com

Airbert: In-domain pretraining for vision-and-language navigation

PL Guhur, M Tapaswi, S Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Vision-and-language navigation (VLN) aims to enable embodied agents to navigate in
realistic environments using natural language instructions. Given the scarcity of domain …

被引用次数：135 相关文章所有 8 个版本

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

被引用次数：110 相关文章所有 6 个版本