Bigger, better, faster: Human-level atari with human-level efficiency

M Schwarzer, JSO Ceron, A Courville… - International …, 2023 - proceedings.mlr.press
We introduce a value-based RL agent, which we call BBF, that achieves super-human
performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used …

A comprehensive survey of data augmentation in visual reinforcement learning

G Ma, Z Wang, Z Yuan, X Wang, B Yuan… - arXiv preprint arXiv …, 2022 - arxiv.org
Visual reinforcement learning (RL), which makes decisions directly from high-dimensional
visual inputs, has demonstrated significant potential in various domains. However …

Perceptual grouping in contrastive vision-language models

K Ranasinghe, B McKinzie, S Ravi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in zero-shot image recognition suggest that vision-language models learn
generic visual representations with a high degree of semantic information that may be …

Rl-vigen: A reinforcement learning benchmark for visual generalization

Z Yuan, S Yang, P Hua, C Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Visual Reinforcement Learning (Visual RL), coupled with high-dimensional
observations, has consistently confronted the long-standing challenge of out-of-distribution …

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

Llara: Supercharging robot learning data for vision-language policy

X Li, C Mata, J Park, K Kahatapitiya, YS Jang… - arXiv preprint arXiv …, 2024 - arxiv.org
LLMs with visual inputs, ie, Vision Language Models (VLMs), have the capacity to process
state information as visual-textual prompts and respond with policy decisions in text. We …

Crossway diffusion: Improving diffusion-based visuomotor policy via self-supervised learning

X Li, V Belagali, J Shang… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Diffusion models have been adopted for behavioral cloning in a sequence modeling
fashion, benefiting from their exceptional capabilities in modeling complex data distributions …

Theia: Distilling diverse vision foundation models for robot learning

J Shang, K Schmeckpeper, BB May, MV Minniti… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-based robot policy learning, which maps visual inputs to actions, necessitates a
holistic understanding of diverse visual tasks beyond single-task needs like classification or …

Gait: Generating aesthetic indoor tours with deep reinforcement learning

D Xie, P Hu, X Sun, S Pirk, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Placing and orienting a camera to compose aesthetically meaningful shots of a scene is not
only a key objective in real-world photography and cinematography but also for virtual …

Starformer: Transformer with state-action-reward representations for visual reinforcement learning

J Shang, K Kahatapitiya, X Li, MS Ryoo - European conference on …, 2022 - Springer
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a
sequence of past state-action-reward experiences, an agent predicts a sequence of next …