Understanding dark scenes by contrasting multi-modal observations

X Dong, N Yokoya - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Understanding dark scenes based on multi-modal image data is challenging, as both the
visible and auxiliary modalities provide limited semantic information for the task. Previous …

NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments

H Wang, AH Tan, G Nejat - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org
In unknown cluttered and dynamic environments such as disaster scenes, mobile robots
need to perform target-driven navigation in order to find people or objects of interest, where …

Multi-modal anchor adaptation learning for multi-modal summarization

Z Chen, Z Lu, H Rong, C Zhao, F Xu - Neurocomputing, 2024 - Elsevier
In this paper, we focus on analyzing the relationship between the input of source text and
source image, and then through the integration and generalization of the multi-modal …

4CNet: A Confidence-Aware, Contrastive, Conditional, Consistency Model for Robot Map Prediction in Multi-Robot Environments

AH Tan, S Narasimhan, G Nejat - arXiv preprint arXiv:2402.17904, 2024 - arxiv.org
Mobile robots in unknown cluttered environments with irregularly shaped obstacles often
face sensing, energy, and communication challenges which directly affect their ability to …

TSCL: Timestamp Supervised Contrastive Learning for Action Segmentation

C Patsch, Y Wu, D Salihu, M Zakour… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Temporal action segmentation is an essential task for understandingcomplex human activity
sequences and identifying long-term dependencies between human actions. This is …

A Survey of Multimodal Perception Methods for Human-Robot Interaction in Social Environments

JA Duncan, F Alambeigi, MW Pryor - ACM Transactions on Human …, 2024 - dl.acm.org
Human-robot interaction (HRI) in human social environments (HSEs) poses unique
challenges for robot perception systems, which must combine asynchronous …

Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics

X Hou, Y Guan, T Han, C Wang - International Journal of Intelligent …, 2024 - Springer
Embodied artificial intelligence (AI) agents, which navigate and interact with their
environment using sensors and actuators, are being applied for mobile robotic platforms …

Find Everything: A General Vision Language Model Approach to Multi-Object Search

D Choi, A Fung, H Wang, AH Tan - arXiv preprint arXiv:2410.00388, 2024 - arxiv.org
The Multi-Object Search (MOS) problem involves navigating to a sequence of locations to
maximize the likelihood of finding target objects while minimizing travel costs. In this paper …

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

M Yang, P Grady, S Brahmbhatt… - … on Robotics and …, 2024 - ieeexplore.ieee.org
How easy is it to sneak up on a robot? We examine whether we can detect people using
only the incidental sounds they produce as they move, even when they try to be quiet. To do …

LDTrack: Dynamic People Tracking by Service Robots using Diffusion Models

A Fung, B Benhabib, G Nejat - arXiv preprint arXiv:2402.08774, 2024 - arxiv.org
Tracking of dynamic people in cluttered and crowded human-centered environments is a
challenging robotics problem due to the presence of intraclass variations including …