Policy optimization with demonstrations

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

被引用次数：723 相关文章所有 6 个版本

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

被引用次数：70 相关文章所有 7 个版本

[HTML] nih.gov

Transfer learning in deep reinforcement learning: A survey

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

被引用次数：590 相关文章所有 12 个版本

[PDF] arxiv.org

Deep reinforcement learning for autonomous driving: A survey

BR Kiran, I Sobh, V Talpaert, P Mannion… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

With the development of deep representation learning, the domain of reinforcement learning
(RL) has become a powerful learning framework now capable of learning complex policies …

被引用次数：1925 相关文章所有 10 个版本

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

被引用次数：90 相关文章所有 5 个版本

[PDF] arxiv.org

Dexmv: Imitation learning for dexterous manipulation from human videos

Y Qin, YH Wu, S Liu, H Jiang, R Yang, Y Fu… - European Conference on …, 2022 - Springer

While significant progress has been made on understanding hand-object interactions in
computer vision, it is still very challenging for robots to perform complex dexterous …

被引用次数：137 相关文章所有 7 个版本

[PDF] arxiv.org

Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning

I Kostrikov, KK Agrawal, D Dwibedi, S Levine… - arXiv preprint arXiv …, 2018 - arxiv.org

We identify two issues with the family of algorithms based on the Adversarial Imitation
Learning framework. The first problem is implicit bias present in the reward functions used in …

被引用次数：310 相关文章所有 7 个版本

[PDF] arxiv.org

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations

T Mu, Z Ling, F Xiang, D Yang, X Li, S Tao… - arXiv preprint arXiv …, 2021 - arxiv.org

Object manipulation from 3D visual inputs poses many challenges on building generalizable
perception and policy models. However, 3D assets in existing benchmarks mostly lack the …

被引用次数：104 相关文章所有 6 个版本

[PDF] cranfield.ac.uk

Model-free reinforcement learning from expert demonstrations: a survey

J Ramírez, W Yu, A Perrusquía - Artificial Intelligence Review, 2022 - Springer

Reinforcement learning from expert demonstrations (RLED) is the intersection of imitation
learning with reinforcement learning that seeks to take advantage of these two learning …

被引用次数：83 相关文章所有 5 个版本

Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards

C Wang, J Wang, J Wang… - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org

Unmanned aerial vehicles (UAVs) have the potential in delivering Internet-of-Things (IoT)
services from a great height, creating an airborne domain of the IoT. In this article, we …

被引用次数：136 相关文章