Bigger, better, faster: Human-level atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human
performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used …
performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used …
A comprehensive survey of data augmentation in visual reinforcement learning
Visual reinforcement learning (RL), which makes decisions directly from high-dimensional
visual inputs, has demonstrated significant potential in various domains. However …
visual inputs, has demonstrated significant potential in various domains. However …
Perceptual grouping in contrastive vision-language models
Recent advances in zero-shot image recognition suggest that vision-language models learn
generic visual representations with a high degree of semantic information that may be …
generic visual representations with a high degree of semantic information that may be …
Rl-vigen: A reinforcement learning benchmark for visual generalization
Abstract Visual Reinforcement Learning (Visual RL), coupled with high-dimensional
observations, has consistently confronted the long-standing challenge of out-of-distribution …
observations, has consistently confronted the long-standing challenge of out-of-distribution …
Vrl3: A data-driven framework for visual deep reinforcement learning
We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …
Llara: Supercharging robot learning data for vision-language policy
LLMs with visual inputs, ie, Vision Language Models (VLMs), have the capacity to process
state information as visual-textual prompts and respond with policy decisions in text. We …
state information as visual-textual prompts and respond with policy decisions in text. We …
Crossway diffusion: Improving diffusion-based visuomotor policy via self-supervised learning
Diffusion models have been adopted for behavioral cloning in a sequence modeling
fashion, benefiting from their exceptional capabilities in modeling complex data distributions …
fashion, benefiting from their exceptional capabilities in modeling complex data distributions …
Theia: Distilling diverse vision foundation models for robot learning
Vision-based robot policy learning, which maps visual inputs to actions, necessitates a
holistic understanding of diverse visual tasks beyond single-task needs like classification or …
holistic understanding of diverse visual tasks beyond single-task needs like classification or …
Gait: Generating aesthetic indoor tours with deep reinforcement learning
Placing and orienting a camera to compose aesthetically meaningful shots of a scene is not
only a key objective in real-world photography and cinematography but also for virtual …
only a key objective in real-world photography and cinematography but also for virtual …
Starformer: Transformer with state-action-reward representations for visual reinforcement learning
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a
sequence of past state-action-reward experiences, an agent predicts a sequence of next …
sequence of past state-action-reward experiences, an agent predicts a sequence of next …