Online bootstrap inference for policy evaluation in reinforcement learning

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis
The recent emergence of reinforcement learning (RL) has created a demand for robust
statistical inference methods for the parameter estimates computed using these algorithms …

Banditpam: Almost linear time k-medoids clustering via multi-armed bandits

M Tiwari, MJ Zhang, J Mayclin… - Advances in …, 2020 - proceedings.neurips.cc
Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …

Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling

S Ghosh, R Kim, P Chhabria, R Dwivedi, P Klasnja… - Machine Learning, 2024 - Springer
There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …

Sub-sampling for efficient non-parametric bandit exploration

D Baudry, E Kaufmann… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that
achieves asymptotically optimal regret simultaneously for different families of arms (namely …

Residual bootstrap exploration for stochastic linear bandit

S Wu, CH Wang, Y Li, G Cheng - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems.
The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next …

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Y Li, G Cheng, X Dai - arXiv preprint arXiv:2406.04374, 2024 - arxiv.org
Recommender systems play a crucial role in internet economies by connecting users with
relevant products or services. However, designing effective recommender systems faces two …

Multiplier bootstrap-based exploration

R Wan, H Wei, B Kveton… - … Conference on Machine …, 2023 - proceedings.mlr.press
Despite the great interest in the bandit problem, designing efficient algorithms for complex
models remains challenging, as there is typically no analytical way to quantify uncertainty. In …

Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits

M Moravej Khorasani, E Weyer - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Upper Confidence Bound (UCB) methods are one of the most effective methods in
dealing with the exploration-exploitation trade-off in online decision-making problems. The …

From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - Advances in Neural …, 2021 - proceedings.neurips.cc
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

Tight non-asymptotic inference via sub-Gaussian intrinsic moment norm

H Zhang, H Wei, G Cheng - arXiv preprint arXiv:2303.07287, 2023 - arxiv.org
In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of
paramount importance. However, directly estimating these parameters using the empirical …