Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes

BP Duisterhof, Z Mandi, Y Yao, JW Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Accurate 3D tracking in highly deformable scenes with occlusions and shadows can
facilitate new applications in robotics, augmented reality, and generative AI. However …

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos

A Balasingam, J Chandler, C Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper presents DriveTrack a new benchmark and data generation framework for long-
range keypoint tracking in real-world videos. DriveTrack is motivated by the observation that …

DINOBot: Robot manipulation via retrieval and alignment with vision foundation models

N Di Palo, E Johns - arXiv preprint arXiv:2402.13181, 2024 - arxiv.org
We propose DINOBot, a novel imitation learning framework for robot manipulation, which
leverages the image-level and pixel-level capabilities of features extracted from Vision …

Flow as the cross-domain manipulation interface

M Xu, Z Xu, Y Xu, C Chi, G Wetzstein, M Veloso… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Im2Flow2Act, a scalable learning framework that enables robots to acquire
manipulation skills from diverse data sources. The key idea behind Im2Flow2Act is to use …

BootsTAP: Bootstrapped Training for Tracking-Any-Point

C Doersch, Y Yang, D Gokay, P Luc, S Koppula… - arXiv preprint arXiv …, 2024 - arxiv.org
To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

General flow as foundation affordance for scalable robot learning

C Yuan, C Wen, T Zhang, Y Gao - arXiv preprint arXiv:2401.11439, 2024 - arxiv.org
We address the challenge of acquiring real-world manipulation skills with a scalable
framework. Inspired by the success of large-scale auto-regressive prediction in Large …

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

WH Chu, AW Harley, P Tokmakov… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Object tracking is central to robot perception and scene understanding, allowing robots to
parse a video stream in terms of moving objects with names. Tracking-by-detection has long …

Generative Image as Action Models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org
Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

Can Visual Foundation Models Achieve Long-term Point Tracking?

G Aydemir, W Xie, F Güney - arXiv preprint arXiv:2408.13575, 2024 - arxiv.org
Large-scale vision foundation models have demonstrated remarkable success across
various tasks, underscoring their robust generalization capabilities. While their proficiency in …