An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

M Jourdan, R Degenne… - Advances in Neural …, 2023 - proceedings.neurips.cc
We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …

Thompson exploration with best challenger rule in best arm identification

J Lee, J Honda, M Sugiyama - Asian Conference on Machine …, 2024 - proceedings.mlr.press
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit
framework in the canonical single-parameter exponential models. For this problem, many …

On the complexity of differentially private best-arm identification with fixed confidence

A Azize, M Jourdan, A Al Marjani… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Best Arm Identification (BAI) problems are progressively used for data-sensitive
applications, such as designing adaptive clinical trials, tuning hyper-parameters, and …

Fixed-budget best-arm identification with heterogeneous reward variances

AL Lalitha, K Kalantari, Y Ma… - Uncertainty in …, 2023 - proceedings.mlr.press
We study the problem of best-arm identification (BAI) in the fixed-budget setting with
heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this …

Only pay for what is uncertain: Variance-adaptive thompson sampling

A Saha, B Kveton - arXiv preprint arXiv:2303.09033, 2023 - arxiv.org
Most bandit algorithms assume that the reward variances or their upper bounds are known,
and that they are the same for all arms. This naturally leads to suboptimal performance and …

Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances

M Kato - arXiv preprint arXiv:2312.12741, 2023 - arxiv.org
We address the problem of best arm identification (BAI) with a fixed budget for two-armed
Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the …

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

Q Zhang, H Wei, L Ying - arXiv preprint arXiv:2406.07455, 2024 - arxiv.org
In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …

Differentially Private Best-Arm Identification

A Azize, M Jourdan, AA Marjani, D Basu - arXiv preprint arXiv:2406.06408, 2024 - arxiv.org
Best Arm Identification (BAI) problems are progressively used for data-sensitive applications,
such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user …

[PDF][PDF] Covariance Adaptive Best Arm Identification

GB El Mehdi Saad, N Verzelen - arXiv preprint arXiv:2306.02630, 2023 - hal.science
We consider the problem of best arm identification in the multi-armed bandit model, under
fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest …