Posterior sampling with delayed feedback for reinforcement learning with linear function...

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Posterior sampling with delayed feedback for reinforcement learning with linear function...

在引用文章中搜索

[PDF] arxiv.org

Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org

We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

被引用次数：12 相关文章所有 6 个版本

[PDF] mlr.press

Robust exploration with adversary via Langevin Monte Carlo

HL Hsu, M Pajic - 6th Annual Learning for Dynamics & …, 2024 - proceedings.mlr.press

In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have
demonstrated efficacy within controlled environments. However, these methods encounter …

被引用次数：1 相关文章

[PDF] arxiv.org

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

HL Hsu, W Wang, M Pajic, P Xu - arXiv preprint arXiv:2404.10728, 2024 - arxiv.org

We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …

被引用次数：1 相关文章所有 2 个版本

[PDF] escholarship.org

On the Data Complexity of Problem-Adaptive Offline Reinforcement Learning

M Yin - 2023 - escholarship.org

Offline reinforcement learning, a field dedicated to optimizing sequential decision-making
strategies using historical data, has found widespread application in real-world scenarios …