Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation

J Wu, Y Zhou, H Yang, Z Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Reinforcement learning (RL) is a promising approach in unmanned ground vehicles (UGVs)
applications, but limited computing resource makes it challenging to deploy a well-behaved …

Frequency-enhanced data augmentation for vision-and-language navigation

K He, C Si, Z Lu, Y Huang, L Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation (VLN) is a challenging task that requires an agent
to navigate through complex environments based on natural language instructions. In …

Multimodal transformer with variable-length memory for vision-and-language navigation

C Lin, Y Jiang, J Cai, L Qu, G Haffari, Z Yuan - European Conference on …, 2022 - Springer
Abstract Vision-and-Language Navigation (VLN) is a task that an agent is required to follow
a language instruction to navigate to the goal position, which relies on the ongoing …

Memory-adaptive vision-and-language navigation

K He, Y Jing, Y Huang, Z Lu, D An, L Wang - Pattern Recognition, 2024 - Elsevier
Abstract Vision-and-Language Navigation (VLN) requests an agent to navigate in 3D
environments following given instructions, where history is critical for decision-making in …

3d question answering with scene graph reasoning

Z Wu, H Li, G Chen, Z Yu, X Gu, Y Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
3DQA has gained considerable attention due to its enhanced spatial understanding
capabilities compared to image-based VQA. However, existing 3DQA methods have …

Pasts: Progress-aware spatio-temporal transformer speaker for vision-and-language navigation

L Wang, C Liu, Z He, S Li, Q Yan, H Chen… - … Applications of Artificial …, 2024 - Elsevier
Vision-and-language navigation (VLN) is a crucial but challenging cross-modal navigation
task. One powerful technique to enhance the generalization performance in VLN is the use …

Incorporating external knowledge reasoning for vision-and-language navigation with assistant's help

X Li, Y Zhang, W Yuan, J Luo - Applied Sciences, 2022 - mdpi.com
Vision-and-Language Navigation (VLN) is a task designed to enable embodied agents carry
out natural language instructions in realistic environments. Most VLN tasks, however, are …

Data Optimization in Deep Learning: A Survey

O Wu, R Yao - arXiv preprint arXiv:2310.16499, 2023 - arxiv.org
Large-scale, high-quality data are considered an essential factor for the successful
application of many deep learning techniques. Meanwhile, numerous real-world deep …

DOROTHIE: Spoken dialogue for handling unexpected situations in interactive autonomous driving agents

Z Ma, B VanDerPloeg, CP Bara, H Yidong… - arXiv preprint arXiv …, 2022 - arxiv.org
In the real world, autonomous driving agents navigate in highly dynamic environments full of
unexpected situations where pre-trained models are unreliable. In these situations, what is …

Heterogeneous Embodied Multi-Agent Collaboration

X Liu, D Guo, X Zhang, H Liu - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org
Multi-agent embodied tasks have been studied in indoor visual environments, but most of
the existing research focuses on homogeneous multi-agent tasks. Heterogeneous multi …