Chain-of-thought predictive control
We study generalizable policy learning from demonstrations for complex low-level control
tasks (eg, contact-rich object manipulations). We propose an imitation learning method that
incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical
RL (HRL) in a novel and effective manner. As a step towards decision foundation models,
our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we
find certain short subsequences of the demos, ie the chain-of-thought (CoT), reflect their …
tasks (eg, contact-rich object manipulations). We propose an imitation learning method that
incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical
RL (HRL) in a novel and effective manner. As a step towards decision foundation models,
our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we
find certain short subsequences of the demos, ie the chain-of-thought (CoT), reflect their …
以上显示的是最相近的搜索结果。 查看全部搜索结果