Crossway diffusion: Improving diffusion-based visuomotor policy via self-supervised learning

X Li, C Mata, J Park, K Kahatapitiya, YS Jang… - arXiv preprint arXiv …, 2024 - arxiv.org

LLMs with visual inputs, ie, Vision Language Models (VLMs), have the capacity to process
state information as visual-textual prompts and respond with policy decisions in text. We …

被引用次数：12 相关文章所有 3 个版本

[PDF] thecvf.com

Limited data, unlimited potential: A study on vits augmented by masked autoencoders

S Das, T Jain, D Reilly, P Balaji… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite
their success, ViTs lack inductive biases, which can make it difficult to train them with limited …

被引用次数：9 相关文章所有 6 个版本

[PDF] acm.org

Diffusion illusions: Hiding images in plain sight

R Burgert, X Li, A Leite, K Ranasinghe… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

We explore the problem of computationally generating special images that produce multi-
arrangement optical illusions when physically arranged and viewed in a certain way, which …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Generative image as action models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org

Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

C Fan, C Bai, Z Shan, H He, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks.
However, existing multi-task planners or policies typically rely on task-specific …

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

Y Chen, H Xue, Y Chen - arXiv preprint arXiv:2405.19424, 2024 - arxiv.org

Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC).
Diffusion policies (DP) based on DMs have elevated BC performance to new heights …

被引用次数：2 相关文章所有 2 个版本

[PDF] openreview.net

Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals

M Reuss, ÖE Yağmurlu, F Wenzel… - First Workshop on Vision …, 2024 - openreview.net

This work introduces the Multimodal Diffusion Transformer (MDT), a novel diffusion policy
framework, that excels at learning versatile behavior from multimodal goal specifications …

被引用次数：15 相关文章

[PDF] arxiv.org

Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

C Hao, K Lin, S Luo, H Soh - arXiv preprint arXiv:2406.09767, 2024 - arxiv.org

Diffusion policies have demonstrated robust performance in generative modeling, prompting
their application in robotic manipulation controlled via language descriptions. In this paper …

[PDF] arxiv.org

Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning

X Zhang, M Chang, P Kumar, S Gupta - arXiv preprint arXiv:2402.17768, 2024 - arxiv.org

A common failure mode for policies trained with imitation is compounding execution errors at
test time. When the learned policy encounters states that were not present in the expert …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

AK Rahimian, MK Govind, S Maity, D Reilly… - arXiv preprint arXiv …, 2024 - arxiv.org

Visual perception tasks are predominantly solved by Vision Transformer (ViT) architectures,
which, despite their effectiveness, encounter a computational bottleneck due to the quadratic …