Aligning cyber space with physical world: A comprehensive survey on embodied ai
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Tactile-augmented radiance fields
Y Dou, F Yang, Y Liu, A Loquercio… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a scene representation that brings vision and touch into a shared 3D space
which we call a tactile-augmented radiance field. This representation capitalizes on two key …
which we call a tactile-augmented radiance field. This representation capitalizes on two key …
Iterated learning improves compositionality in large vision-language models
A fundamental characteristic common to both human vision and natural language is their
compositional nature. Yet despite the performance gains contributed by large vision and …
compositional nature. Yet despite the performance gains contributed by large vision and …
Augundo: Scaling up augmentations for monocular depth completion and estimation
Unsupervised depth completion and estimation methods are trained by minimizing
reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are …
reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are …
Wordepth: Variational language prior for monocular depth estimation
Abstract Three-dimensional (3D) reconstruction from a single image is an ill-posed problem
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …
Test-Time Adaptation for Depth Completion
It is common to observe performance degradation when transferring models trained on
some (source) datasets to target testing data due to a domain gap between them. Existing …
some (source) datasets to target testing data due to a domain gap between them. Existing …
Neurobind: Towards unified multimodal representations for neural signals
Understanding neural activity and information representation is crucial for advancing
knowledge of brain function and cognition. Neural activity, measured through techniques …
knowledge of brain function and cognition. Neural activity, measured through techniques …
A touch, vision, and language dataset for multimodal alignment
Touch is an important sensing modality for humans, but it has not yet been incorporated into
a multimodal generative language model. This is partially due to the difficulty of obtaining …
a multimodal generative language model. This is partially due to the difficulty of obtaining …
Gradient-less federated gradient boosting tree with learnable learning rates
The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme
Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context …
Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context …