Residual bootstrap exploration for bandit algorithms

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis

The recent emergence of reinforcement learning (RL) has created a demand for robust
statistical inference methods for the parameter estimates computed using these algorithms …

被引用次数：38 相关文章所有 9 个版本

[PDF] neurips.cc

Banditpam: Almost linear time k-medoids clustering via multi-armed bandits

M Tiwari, MJ Zhang, J Mayclin… - Advances in …, 2020 - proceedings.neurips.cc

Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …

被引用次数：47 相关文章所有 8 个版本

[PDF] arxiv.org

Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling

S Ghosh, R Kim, P Chhabria, R Dwivedi, P Klasnja… - Machine Learning, 2024 - Springer

There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …

被引用次数：8 相关文章所有 4 个版本

[PDF] neurips.cc

Sub-sampling for efficient non-parametric bandit exploration

D Baudry, E Kaufmann… - Advances in Neural …, 2020 - proceedings.neurips.cc

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that
achieves asymptotically optimal regret simultaneously for different families of arms (namely …

被引用次数：24 相关文章所有 11 个版本

[PDF] mlr.press

Residual bootstrap exploration for stochastic linear bandit

S Wu, CH Wang, Y Li, G Cheng - Uncertainty in Artificial …, 2022 - proceedings.mlr.press

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems.
The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Y Li, G Cheng, X Dai - arXiv preprint arXiv:2406.04374, 2024 - arxiv.org

Recommender systems play a crucial role in internet economies by connecting users with
relevant products or services. However, designing effective recommender systems faces two …

被引用次数：1 相关文章所有 2 个版本

[PDF] mlr.press

Multiplier bootstrap-based exploration

R Wan, H Wei, B Kveton… - … Conference on Machine …, 2023 - proceedings.mlr.press

Despite the great interest in the bandit problem, designing efficient algorithms for complex
models remains challenging, as there is typically no analytical way to quantify uncertainty. In …

被引用次数：4 相关文章所有 6 个版本

[PDF] neurips.cc

Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits

M Moravej Khorasani, E Weyer - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Upper Confidence Bound (UCB) methods are one of the most effective methods in
dealing with the exploration-exploitation trade-off in online decision-making problems. The …

被引用次数：1 相关文章

[PDF] neurips.cc

From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - Advances in Neural …, 2021 - proceedings.neurips.cc

The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Tight non-asymptotic inference via sub-Gaussian intrinsic moment norm

H Zhang, H Wei, G Cheng - arXiv preprint arXiv:2303.07287, 2023 - arxiv.org

In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of
paramount importance. However, directly estimating these parameters using the empirical …

被引用次数：4 相关文章所有 5 个版本