Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks

MA Lee, Y Zhu, K Srinivasan, P Shah… - … on robotics and …, 2019 - ieeexplore.ieee.org
2019 International conference on robotics and automation (ICRA), 2019ieeexplore.ieee.org
Contact-rich manipulation tasks in unstructured environments often require both haptic and
visual feedback. However, it is non-trivial to manually design a robot controller that
combines modalities with very different characteristics. While deep reinforcement learning
has shown success in learning control policies for high-dimensional inputs, these algorithms
are generally intractable to deploy on real robots due to sample complexity. We use self-
supervision to learn a compact and multimodal representation of our sensory inputs, which …
Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. We present results in simulation and on a real robot.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果