An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond
We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …
Non-asymptotic analysis of a ucb-based top two algorithm
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …
Thompson exploration with best challenger rule in best arm identification
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit
framework in the canonical single-parameter exponential models. For this problem, many …
framework in the canonical single-parameter exponential models. For this problem, many …
On the complexity of differentially private best-arm identification with fixed confidence
Abstract Best Arm Identification (BAI) problems are progressively used for data-sensitive
applications, such as designing adaptive clinical trials, tuning hyper-parameters, and …
applications, such as designing adaptive clinical trials, tuning hyper-parameters, and …
Fixed-budget best-arm identification with heterogeneous reward variances
AL Lalitha, K Kalantari, Y Ma… - Uncertainty in …, 2023 - proceedings.mlr.press
We study the problem of best-arm identification (BAI) in the fixed-budget setting with
heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this …
heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this …
Only pay for what is uncertain: Variance-adaptive thompson sampling
Most bandit algorithms assume that the reward variances or their upper bounds are known,
and that they are the same for all arms. This naturally leads to suboptimal performance and …
and that they are the same for all arms. This naturally leads to suboptimal performance and …
Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances
M Kato - arXiv preprint arXiv:2312.12741, 2023 - arxiv.org
We address the problem of best arm identification (BAI) with a fixed budget for two-armed
Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the …
Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the …
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …
episodic Markov decision process with a general trajectory-wise reward model. We …
Differentially Private Best-Arm Identification
Best Arm Identification (BAI) problems are progressively used for data-sensitive applications,
such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user …
such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user …
[PDF][PDF] Covariance Adaptive Best Arm Identification
GB El Mehdi Saad, N Verzelen - arXiv preprint arXiv:2306.02630, 2023 - hal.science
We consider the problem of best arm identification in the multi-armed bandit model, under
fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest …
fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest …