Online bootstrap inference for policy evaluation in reinforcement learning
The recent emergence of reinforcement learning (RL) has created a demand for robust
statistical inference methods for the parameter estimates computed using these algorithms …
statistical inference methods for the parameter estimates computed using these algorithms …
Banditpam: Almost linear time k-medoids clustering via multi-armed bandits
Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …
clustering, k-medoids clustering requires the cluster centers to be actual data points and …
Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling
There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …
treatments in digital health to support users in adopting healthier behaviors. Such sequential …
Sub-sampling for efficient non-parametric bandit exploration
D Baudry, E Kaufmann… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that
achieves asymptotically optimal regret simultaneously for different families of arms (namely …
achieves asymptotically optimal regret simultaneously for different families of arms (namely …
Residual bootstrap exploration for stochastic linear bandit
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems.
The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next …
The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next …
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility
Recommender systems play a crucial role in internet economies by connecting users with
relevant products or services. However, designing effective recommender systems faces two …
relevant products or services. However, designing effective recommender systems faces two …
Multiplier bootstrap-based exploration
Despite the great interest in the bandit problem, designing efficient algorithms for complex
models remains challenging, as there is typically no analytical way to quantify uncertainty. In …
models remains challenging, as there is typically no analytical way to quantify uncertainty. In …
Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits
M Moravej Khorasani, E Weyer - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Upper Confidence Bound (UCB) methods are one of the most effective methods in
dealing with the exploration-exploitation trade-off in online decision-making problems. The …
dealing with the exploration-exploitation trade-off in online decision-making problems. The …
From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …
assumptions on the arm's distribution (eg bounded with known support, exponential family …
Tight non-asymptotic inference via sub-Gaussian intrinsic moment norm
In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of
paramount importance. However, directly estimating these parameters using the empirical …
paramount importance. However, directly estimating these parameters using the empirical …