A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Transfer learning in deep reinforcement learning: A survey

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org
Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

Deep reinforcement learning for autonomous driving: A survey

BR Kiran, I Sobh, V Talpaert, P Mannion… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
With the development of deep representation learning, the domain of reinforcement learning
(RL) has become a powerful learning framework now capable of learning complex policies …

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

Dexmv: Imitation learning for dexterous manipulation from human videos

Y Qin, YH Wu, S Liu, H Jiang, R Yang, Y Fu… - European Conference on …, 2022 - Springer
While significant progress has been made on understanding hand-object interactions in
computer vision, it is still very challenging for robots to perform complex dexterous …

Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning

I Kostrikov, KK Agrawal, D Dwibedi, S Levine… - arXiv preprint arXiv …, 2018 - arxiv.org
We identify two issues with the family of algorithms based on the Adversarial Imitation
Learning framework. The first problem is implicit bias present in the reward functions used in …

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations

T Mu, Z Ling, F Xiang, D Yang, X Li, S Tao… - arXiv preprint arXiv …, 2021 - arxiv.org
Object manipulation from 3D visual inputs poses many challenges on building generalizable
perception and policy models. However, 3D assets in existing benchmarks mostly lack the …

Model-free reinforcement learning from expert demonstrations: a survey

J Ramírez, W Yu, A Perrusquía - Artificial Intelligence Review, 2022 - Springer
Reinforcement learning from expert demonstrations (RLED) is the intersection of imitation
learning with reinforcement learning that seeks to take advantage of these two learning …

Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards

C Wang, J Wang, J Wang… - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org
Unmanned aerial vehicles (UAVs) have the potential in delivering Internet-of-Things (IoT)
services from a great height, creating an airborne domain of the IoT. In this article, we …