A survey of embodied ai: From simulators to research tasks
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …
where AI algorithms and agents no longer learn from datasets of images, videos or text …
Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Vision-language navigation with self-supervised auxiliary reasoning tasks
Abstract Vision-Language Navigation (VLN) is a task where an agent learns to navigate
following a natural language instruction. The key to this task is to perceive both the visual …
following a natural language instruction. The key to this task is to perceive both the visual …
Hop: History-and-order aware pre-training for vision-and-language navigation
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …
Soon: Scenario oriented object navigation with graph-based exploration
The ability to navigate like a human towards a language-guided target from anywhere in a
3D embodied environment is one of the'holy grail'goals of intelligent robots. Most visual …
3D embodied environment is one of the'holy grail'goals of intelligent robots. Most visual …
Hop+: History-enhanced and order-aware pre-training for vision-and-language navigation
Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN).
However, these methods neglect the importance of historical contexts or ignore predicting …
However, these methods neglect the importance of historical contexts or ignore predicting …
The road to know-where: An object-and-room informed sequential bert for indoor vision-language navigation
Abstract Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote
location on the basis of natural-language instructions and a set of photo-realistic …
location on the basis of natural-language instructions and a set of photo-realistic …
Real-time detection of full-scale forest fire smoke based on deep convolution neural network
To reduce the loss induced by forest fires, it is very important to detect the forest fire smoke in
real time so that early and timely warning can be issued. Machine vision and image …
real time so that early and timely warning can be issued. Machine vision and image …
Grounded entity-landmark adaptive pre-training for vision-and-language navigation
Y Cui, L Xie, Y Zhang, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN).
Most existing studies concentrate on mapping the global instruction or single sub-instruction …
Most existing studies concentrate on mapping the global instruction or single sub-instruction …