- 学术资源搜索

An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

M Jourdan, R Degenne… - Advances in Neural …, 2023 - proceedings.neurips.cc

We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …

被引用次数：6 相关文章所有 11 个版本

[PDF] neurips.cc

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc

A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …

被引用次数：7 相关文章所有 9 个版本

[PDF] mlr.press

Thompson exploration with best challenger rule in best arm identification

J Lee, J Honda, M Sugiyama - Asian Conference on Machine …, 2024 - proceedings.mlr.press

This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit
framework in the canonical single-parameter exponential models. For this problem, many …

被引用次数：4 相关文章所有 3 个版本

[PDF] neurips.cc

On the complexity of differentially private best-arm identification with fixed confidence

A Azize, M Jourdan, A Al Marjani… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Best Arm Identification (BAI) problems are progressively used for data-sensitive
applications, such as designing adaptive clinical trials, tuning hyper-parameters, and …

被引用次数：2 相关文章所有 10 个版本

[PDF] mlr.press

Fixed-budget best-arm identification with heterogeneous reward variances

AL Lalitha, K Kalantari, Y Ma… - Uncertainty in …, 2023 - proceedings.mlr.press

We study the problem of best-arm identification (BAI) in the fixed-budget setting with
heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Only pay for what is uncertain: Variance-adaptive thompson sampling

A Saha, B Kveton - arXiv preprint arXiv:2303.09033, 2023 - arxiv.org

Most bandit algorithms assume that the reward variances or their upper bounds are known,
and that they are the same for all arms. This naturally leads to suboptimal performance and …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances

M Kato - arXiv preprint arXiv:2312.12741, 2023 - arxiv.org

We address the problem of best arm identification (BAI) with a fixed budget for two-armed
Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

Q Zhang, H Wei, L Ying - arXiv preprint arXiv:2406.07455, 2024 - arxiv.org

In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …

Differentially Private Best-Arm Identification

A Azize, M Jourdan, AA Marjani, D Basu - arXiv preprint arXiv:2406.06408, 2024 - arxiv.org

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications,
such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user …

[PDF][PDF] Covariance Adaptive Best Arm Identification

GB El Mehdi Saad, N Verzelen - arXiv preprint arXiv:2306.02630, 2023 - hal.science

We consider the problem of best arm identification in the multi-armed bandit model, under
fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest …

被引用次数：1 相关文章所有 2 个版本