Robotap: Tracking arbitrary points for few-shot visual imitation

BP Duisterhof, Z Mandi, Y Yao, JW Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Accurate 3D tracking in highly deformable scenes with occlusions and shadows can
facilitate new applications in robotics, augmented reality, and generative AI. However …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

被引用次数：3 相关文章所有 4 个版本

[PDF] thecvf.com

DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos

A Balasingam, J Chandler, C Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper presents DriveTrack a new benchmark and data generation framework for long-
range keypoint tracking in real-world videos. DriveTrack is motivated by the observation that …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

DINOBot: Robot manipulation via retrieval and alignment with vision foundation models

N Di Palo, E Johns - arXiv preprint arXiv:2402.13181, 2024 - arxiv.org

We propose DINOBot, a novel imitation learning framework for robot manipulation, which
leverages the image-level and pixel-level capabilities of features extracted from Vision …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Flow as the cross-domain manipulation interface

M Xu, Z Xu, Y Xu, C Chi, G Wetzstein, M Veloso… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Im2Flow2Act, a scalable learning framework that enables robots to acquire
manipulation skills from diverse data sources. The key idea behind Im2Flow2Act is to use …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

BootsTAP: Bootstrapped Training for Tracking-Any-Point

C Doersch, Y Yang, D Gokay, P Luc, S Koppula… - arXiv preprint arXiv …, 2024 - arxiv.org

To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

General flow as foundation affordance for scalable robot learning

C Yuan, C Wen, T Zhang, Y Gao - arXiv preprint arXiv:2401.11439, 2024 - arxiv.org

We address the challenge of acquiring real-world manipulation skills with a scalable
framework. Inspired by the success of large-scale auto-regressive prediction in Large …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

WH Chu, AW Harley, P Tokmakov… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Object tracking is central to robot perception and scene understanding, allowing robots to
parse a video stream in terms of moving objects with names. Tracking-by-detection has long …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Generative Image as Action Models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org

Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Can Visual Foundation Models Achieve Long-term Point Tracking?

G Aydemir, W Xie, F Güney - arXiv preprint arXiv:2408.13575, 2024 - arxiv.org

Large-scale vision foundation models have demonstrated remarkable success across
various tasks, underscoring their robust generalization capabilities. While their proficiency in …

被引用次数：1 相关文章所有 2 个版本