Aligning cyber space with physical world: A comprehensive survey on embodied ai

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Tactile-augmented radiance fields

Y Dou, F Yang, Y Liu, A Loquercio… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a scene representation that brings vision and touch into a shared 3D space
which we call a tactile-augmented radiance field. This representation capitalizes on two key …

Iterated learning improves compositionality in large vision-language models

C Zheng, J Zhang, A Kembhavi… - Proceedings of the …, 2024 - openaccess.thecvf.com
A fundamental characteristic common to both human vision and natural language is their
compositional nature. Yet despite the performance gains contributed by large vision and …

Augundo: Scaling up augmentations for monocular depth completion and estimation

Y Wu, TY Liu, H Park, S Soatto, D Lao… - European Conference on …, 2025 - Springer
Unsupervised depth completion and estimation methods are trained by minimizing
reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are …

Wordepth: Variational language prior for monocular depth estimation

Z Zeng, D Wang, F Yang, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Three-dimensional (3D) reconstruction from a single image is an ill-posed problem
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …

Test-Time Adaptation for Depth Completion

H Park, A Gupta, A Wong - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
It is common to observe performance degradation when transferring models trained on
some (source) datasets to target testing data due to a domain gap between them. Existing …

Neurobind: Towards unified multimodal representations for neural signals

F Yang, C Feng, D Wang, T Wang, Z Zeng, Z Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding neural activity and information representation is crucial for advancing
knowledge of brain function and cognition. Neural activity, measured through techniques …

A touch, vision, and language dataset for multimodal alignment

L Fu, G Datta, H Huang, WCH Panitch, J Drake… - arXiv preprint arXiv …, 2024 - arxiv.org
Touch is an important sensing modality for humans, but it has not yet been incorporated into
a multimodal generative language model. This is partially due to the difficulty of obtaining …

Gradient-less federated gradient boosting tree with learnable learning rates

C Ma, X Qiu, D Beutel, N Lane - Proceedings of the 3rd Workshop on …, 2023 - dl.acm.org
The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme
Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context …