Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

Robust exploration with adversary via Langevin Monte Carlo

HL Hsu, M Pajic - 6th Annual Learning for Dynamics & …, 2024 - proceedings.mlr.press
In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have
demonstrated efficacy within controlled environments. However, these methods encounter …

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

HL Hsu, W Wang, M Pajic, P Xu - arXiv preprint arXiv:2404.10728, 2024 - arxiv.org
We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …

On the Data Complexity of Problem-Adaptive Offline Reinforcement Learning

M Yin - 2023 - escholarship.org
Offline reinforcement learning, a field dedicated to optimizing sequential decision-making
strategies using historical data, has found widespread application in real-world scenarios …