Accelerating exploration with unlabeled prior data

Q Li, J Zhang, D Ghosh, A Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Learning to solve tasks from a sparse reward signal is a major challenge for standard
reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to …

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

V Balazadeh, K Chidambaram, V Nguyen… - arXiv preprint arXiv …, 2024 - arxiv.org
We study the problem of online sequential decision-making given auxiliary demonstrations
from experts who made their decisions based on unobserved contextual information. These …