Vision-dialog navigation by exploring cross-modal memory

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

被引用次数：302 相关文章所有 8 个版本

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：107 相关文章所有 5 个版本

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

被引用次数：128 相关文章所有 6 个版本

[PDF] thecvf.com

Vision-language navigation with self-supervised auxiliary reasoning tasks

F Zhu, Y Zhu, X Chang, X Liang - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Abstract Vision-Language Navigation (VLN) is a task where an agent learns to navigate
following a natural language instruction. The key to this task is to perceive both the visual …

被引用次数：271 相关文章所有 13 个版本

[PDF] thecvf.com

Hop: History-and-order aware pre-training for vision-and-language navigation

Y Qiao, Y Qi, Y Hong, Z Yu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …

被引用次数：87 相关文章所有 7 个版本

[PDF] thecvf.com

Soon: Scenario oriented object navigation with graph-based exploration

F Zhu, X Liang, Y Zhu, Q Yu… - Proceedings of the …, 2021 - openaccess.thecvf.com

The ability to navigate like a human towards a language-guided target from anywhere in a
3D embodied environment is one of the'holy grail'goals of intelligent robots. Most visual …

被引用次数：120 相关文章所有 8 个版本

Hop+: History-enhanced and order-aware pre-training for vision-and-language navigation

Y Qiao, Y Qi, Y Hong, Z Yu, P Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN).
However, these methods neglect the importance of historical contexts or ignore predicting …

被引用次数：52 相关文章所有 8 个版本

[PDF] thecvf.com

The road to know-where: An object-and-room informed sequential bert for indoor vision-language navigation

Y Qi, Z Pan, Y Hong, MH Yang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote
location on the basis of natural-language instructions and a set of photo-realistic …

被引用次数：84 相关文章所有 13 个版本

[PDF] mdpi.com

Real-time detection of full-scale forest fire smoke based on deep convolution neural network

X Zheng, F Chen, L Lou, P Cheng, Y Huang - Remote Sensing, 2022 - mdpi.com

To reduce the loss induced by forest fires, it is very important to detect the forest fire smoke in
real time so that early and timely warning can be issued. Machine vision and image …

被引用次数：70 相关文章所有 7 个版本

[PDF] thecvf.com

Grounded entity-landmark adaptive pre-training for vision-and-language navigation

Y Cui, L Xie, Y Zhang, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN).
Most existing studies concentrate on mapping the global instruction or single sub-instruction …

被引用次数：16 相关文章所有 5 个版本