A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Vision-language navigation with self-supervised auxiliary reasoning tasks

F Zhu, Y Zhu, X Chang, X Liang - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Abstract Vision-Language Navigation (VLN) is a task where an agent learns to navigate
following a natural language instruction. The key to this task is to perceive both the visual …

Hop: History-and-order aware pre-training for vision-and-language navigation

Y Qiao, Y Qi, Y Hong, Z Yu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …

Soon: Scenario oriented object navigation with graph-based exploration

F Zhu, X Liang, Y Zhu, Q Yu… - Proceedings of the …, 2021 - openaccess.thecvf.com
The ability to navigate like a human towards a language-guided target from anywhere in a
3D embodied environment is one of the'holy grail'goals of intelligent robots. Most visual …

Hop+: History-enhanced and order-aware pre-training for vision-and-language navigation

Y Qiao, Y Qi, Y Hong, Z Yu, P Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN).
However, these methods neglect the importance of historical contexts or ignore predicting …

The road to know-where: An object-and-room informed sequential bert for indoor vision-language navigation

Y Qi, Z Pan, Y Hong, MH Yang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote
location on the basis of natural-language instructions and a set of photo-realistic …

Real-time detection of full-scale forest fire smoke based on deep convolution neural network

X Zheng, F Chen, L Lou, P Cheng, Y Huang - Remote Sensing, 2022 - mdpi.com
To reduce the loss induced by forest fires, it is very important to detect the forest fire smoke in
real time so that early and timely warning can be issued. Machine vision and image …

Grounded entity-landmark adaptive pre-training for vision-and-language navigation

Y Cui, L Xie, Y Zhang, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN).
Most existing studies concentrate on mapping the global instruction or single sub-instruction …