Understanding dark scenes by contrasting multi-modal observations
Understanding dark scenes based on multi-modal image data is challenging, as both the
visible and auxiliary modalities provide limited semantic information for the task. Previous …
visible and auxiliary modalities provide limited semantic information for the task. Previous …
NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments
In unknown cluttered and dynamic environments such as disaster scenes, mobile robots
need to perform target-driven navigation in order to find people or objects of interest, where …
need to perform target-driven navigation in order to find people or objects of interest, where …
Multi-modal anchor adaptation learning for multi-modal summarization
Z Chen, Z Lu, H Rong, C Zhao, F Xu - Neurocomputing, 2024 - Elsevier
In this paper, we focus on analyzing the relationship between the input of source text and
source image, and then through the integration and generalization of the multi-modal …
source image, and then through the integration and generalization of the multi-modal …
4CNet: A Confidence-Aware, Contrastive, Conditional, Consistency Model for Robot Map Prediction in Multi-Robot Environments
Mobile robots in unknown cluttered environments with irregularly shaped obstacles often
face sensing, energy, and communication challenges which directly affect their ability to …
face sensing, energy, and communication challenges which directly affect their ability to …
TSCL: Timestamp Supervised Contrastive Learning for Action Segmentation
Temporal action segmentation is an essential task for understandingcomplex human activity
sequences and identifying long-term dependencies between human actions. This is …
sequences and identifying long-term dependencies between human actions. This is …
A Survey of Multimodal Perception Methods for Human-Robot Interaction in Social Environments
JA Duncan, F Alambeigi, MW Pryor - ACM Transactions on Human …, 2024 - dl.acm.org
Human-robot interaction (HRI) in human social environments (HSEs) poses unique
challenges for robot perception systems, which must combine asynchronous …
challenges for robot perception systems, which must combine asynchronous …
Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics
Embodied artificial intelligence (AI) agents, which navigate and interact with their
environment using sensors and actuators, are being applied for mobile robotic platforms …
environment using sensors and actuators, are being applied for mobile robotic platforms …
Find Everything: A General Vision Language Model Approach to Multi-Object Search
The Multi-Object Search (MOS) problem involves navigating to a sequence of locations to
maximize the likelihood of finding target objects while minimizing travel costs. In this paper …
maximize the likelihood of finding target objects while minimizing travel costs. In this paper …
The Un-Kidnappable Robot: Acoustic Localization of Sneaking People
How easy is it to sneak up on a robot? We examine whether we can detect people using
only the incidental sounds they produce as they move, even when they try to be quiet. To do …
only the incidental sounds they produce as they move, even when they try to be quiet. To do …
LDTrack: Dynamic People Tracking by Service Robots using Diffusion Models
Tracking of dynamic people in cluttered and crowded human-centered environments is a
challenging robotics problem due to the presence of intraclass variations including …
challenging robotics problem due to the presence of intraclass variations including …