Adversarial reinforced instruction attacker for robust vision-language navigation

J Wu, Y Zhou, H Yang, Z Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Reinforcement learning (RL) is a promising approach in unmanned ground vehicles (UGVs)
applications, but limited computing resource makes it challenging to deploy a well-behaved …

被引用次数：42 相关文章所有 6 个版本

[PDF] neurips.cc

Frequency-enhanced data augmentation for vision-and-language navigation

K He, C Si, Z Lu, Y Huang, L Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Vision-and-Language Navigation (VLN) is a challenging task that requires an agent
to navigate through complex environments based on natural language instructions. In …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Multimodal transformer with variable-length memory for vision-and-language navigation

C Lin, Y Jiang, J Cai, L Qu, G Haffari, Z Yuan - European Conference on …, 2022 - Springer

Abstract Vision-and-Language Navigation (VLN) is a task that an agent is required to follow
a language instruction to navigate to the goal position, which relies on the ongoing …

被引用次数：37 相关文章所有 6 个版本

[PDF] ssrn.com

Memory-adaptive vision-and-language navigation

K He, Y Jing, Y Huang, Z Lu, D An, L Wang - Pattern Recognition, 2024 - Elsevier

Abstract Vision-and-Language Navigation (VLN) requests an agent to navigate in 3D
environments following given instructions, where history is critical for decision-making in …

被引用次数：4 相关文章所有 3 个版本

[PDF] zizhao.me

3d question answering with scene graph reasoning

Z Wu, H Li, G Chen, Z Yu, X Gu, Y Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

3DQA has gained considerable attention due to its enhanced spatial understanding
capabilities compared to image-based VQA. However, existing 3DQA methods have …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Pasts: Progress-aware spatio-temporal transformer speaker for vision-and-language navigation

L Wang, C Liu, Z He, S Li, Q Yan, H Chen… - … Applications of Artificial …, 2024 - Elsevier

Vision-and-language navigation (VLN) is a crucial but challenging cross-modal navigation
task. One powerful technique to enhance the generalization performance in VLN is the use …

被引用次数：9 相关文章所有 4 个版本

[PDF] mdpi.com

Incorporating external knowledge reasoning for vision-and-language navigation with assistant's help

X Li, Y Zhang, W Yuan, J Luo - Applied Sciences, 2022 - mdpi.com

Vision-and-Language Navigation (VLN) is a task designed to enable embodied agents carry
out natural language instructions in realistic environments. Most VLN tasks, however, are …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Data Optimization in Deep Learning: A Survey

O Wu, R Yao - arXiv preprint arXiv:2310.16499, 2023 - arxiv.org

Large-scale, high-quality data are considered an essential factor for the successful
application of many deep learning techniques. Meanwhile, numerous real-world deep …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

DOROTHIE: Spoken dialogue for handling unexpected situations in interactive autonomous driving agents

Z Ma, B VanDerPloeg, CP Bara, H Yidong… - arXiv preprint arXiv …, 2022 - arxiv.org

In the real world, autonomous driving agents navigate in highly dynamic environments full of
unexpected situations where pre-trained models are unreliable. In these situations, what is …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

Heterogeneous Embodied Multi-Agent Collaboration

X Liu, D Guo, X Zhang, H Liu - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org

Multi-agent embodied tasks have been studied in indoor visual environments, but most of
the existing research focuses on homogeneous multi-agent tasks. Heterogeneous multi …

被引用次数：2 相关文章所有 3 个版本