Thompson sampling with less exploration is fast and optimal

H Zheng, W Deng, C Moya… - … Conference on Artificial …, 2024 - proceedings.mlr.press

Abstract Approximate Thompson sampling with Langevin Monte Carlo broadens its reach
from Gaussian posterior sampling to encompass more general smooth posteriors. However …

被引用次数：5 相关文章所有 4 个版本

[PDF] aaai.org

Finite-time frequentist regret bounds of multi-agent thompson sampling on sparse hypergraphs

T Jin, HL Hsu, W Chang, P Xu - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored
into overlapping groups. Each group represents a hyperedge, forming a hypergraph over …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

B Do, T Adebiyi, R Zhang - … of Computing and …, 2024 - asmedigitalcollection.asme.org

Bayesian optimization (BO) has become a powerful tool for solving simulation-based
engineering optimization problems thanks to its ability to integrate physical and …

被引用次数：4 相关文章所有 2 个版本

[PDF] wiley.com Full View

Efficient and robust sequential decision making algorithms

P Xu - AI Magazine, 2024 - Wiley Online Library

Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …

{\epsilon}-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

HL Hsu, Q Gao, M Pajic - arXiv preprint arXiv:2403.06814, 2024 - arxiv.org

Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor
symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Only pay for what is uncertain: Variance-adaptive thompson sampling

A Saha, B Kveton - arXiv preprint arXiv:2303.09033, 2023 - arxiv.org

Most bandit algorithms assume that the reward variances or their upper bounds are known,
and that they are the same for all arms. This naturally leads to suboptimal performance and …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

HL Hsu, W Wang, M Pajic, P Xu - arXiv preprint arXiv:2404.10728, 2024 - arxiv.org

We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …

被引用次数：5 相关文章所有 2 个版本