Learning visual representations via language-guided sampling
M El Banani, K Desai… - Proceedings of the ieee …, 2023 - openaccess.thecvf.com
Although an object may appear in numerous contexts, we often describe it in a limited
number of ways. Language allows us to abstract away visual variation to represent and …
number of ways. Language allows us to abstract away visual variation to represent and …
Perceptual grouping in contrastive vision-language models
Recent advances in zero-shot image recognition suggest that vision-language models learn
generic visual representations with a high degree of semantic information that may be …
generic visual representations with a high degree of semantic information that may be …
Theia: Distilling diverse vision foundation models for robot learning
Vision-based robot policy learning, which maps visual inputs to actions, necessitates a
holistic understanding of diverse visual tasks beyond single-task needs like classification or …
holistic understanding of diverse visual tasks beyond single-task needs like classification or …
Starformer: Transformer with state-action-reward representations for visual reinforcement learning
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a
sequence of past state-action-reward experiences, an agent predicts a sequence of next …
sequence of past state-action-reward experiences, an agent predicts a sequence of next …
Limited data, unlimited potential: A study on vits augmented by masked autoencoders
Abstract Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite
their success, ViTs lack inductive biases, which can make it difficult to train them with limited …
their success, ViTs lack inductive biases, which can make it difficult to train them with limited …
Cross-view action recognition understanding from exocentric to egocentric perspective
Understanding action recognition in egocentric videos has emerged as a vital research topic
with numerous practical applications. With the limitation in the scale of egocentric data …
with numerous practical applications. With the limitation in the scale of egocentric data …
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
We humans are good at translating third-person observations of hand-object interactions
(HOI) into an egocentric view. However, current methods struggle to replicate this ability of …
(HOI) into an egocentric view. However, current methods struggle to replicate this ability of …
Neural neural textures make sim2real consistent
Unpaired image translation algorithms can be used for sim2real tasks, but many fail to
generate temporally consistent results. We present a new approach that combines …
generate temporally consistent results. We present a new approach that combines …
Seeing the pose in the pixels: learning pose-aware representations in vision transformers
Human perception of surroundings is often guided by the various poses present within the
environment. Many computer vision tasks, such as human action recognition and robot …
environment. Many computer vision tasks, such as human action recognition and robot …
Active vision reinforcement learning under limited visual observability
In this work, we investigate Active Vision Reinforcement Learning (ActiveVision-RL), where
an embodied agent simultaneously learns action policy for the task while also controlling its …
an embodied agent simultaneously learns action policy for the task while also controlling its …