Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback
Exploration has been one of the greatest challenges in reinforcement learning (RL), which is
a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms,
building a well-learned agent often requires too many trials, mainly due to the difficulty of
matching its actions with rewards in the distant future. A remedy for this is to train an agent
with real-time feedback from a human observer who immediately gives rewards for some
actions. This study tackles a series of challenges for introducing such a human-in-the-loop …
a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms,
building a well-learned agent often requires too many trials, mainly due to the difficulty of
matching its actions with rewards in the distant future. A remedy for this is to train an agent
with real-time feedback from a human observer who immediately gives rewards for some
actions. This study tackles a series of challenges for introducing such a human-in-the-loop …
以上显示的是最相近的搜索结果。 查看全部搜索结果