Best arm identification for contaminated bandits

MA Newton, AE Raftery - Journal of the Royal Statistical Society …, 1994 - academic.oup.com

We introduce the weighted likelihood bootstrap (WLB) as a way to simulate approximately
from a posterior distribution. This method is often easy to implement, requiring only an …

被引用次数：2322 相关文章所有 22 个版本

[PDF] ccs-labs.org

Empowering the 6G cellular architecture with Open RAN

M Polese, M Dohler, F Dressler… - IEEE Journal on …, 2023 - ieeexplore.ieee.org

Innovation and standardization in 5G have brought advancements to every facet of the
cellular architecture. This ranges from the introduction of new frequency bands and …

被引用次数：37 相关文章所有 9 个版本

[PDF] mlr.press

Better algorithms for stochastic bandits with adversarial corruptions

A Gupta, T Koren, K Talwar - Conference on Learning …, 2019 - proceedings.mlr.press

We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …

被引用次数：184 相关文章所有 5 个版本

[PDF] mlr.press

Adaptive reward-poisoning attacks against reinforcement learning

X Zhang, Y Ma, A Singla, X Zhu - … Conference on Machine …, 2020 - proceedings.mlr.press

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the
environment reward $ r_t $ into $ r_t+\delta_t $ at each step, with the goal of forcing the RL …

被引用次数：152 相关文章所有 10 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：56 相关文章所有 11 个版本

[PDF] arxiv.org

Reward poisoning in reinforcement learning: Attacks against unknown learners in unknown environments

A Rakhsha, X Zhang, X Zhu, A Singla - arXiv preprint arXiv:2102.08492, 2021 - arxiv.org

We study black-box reward poisoning attacks against reinforcement learning (RL), in which
an adversary aims to manipulate the rewards to mislead a sequence of RL agents with …

被引用次数：42 相关文章所有 4 个版本

[PDF] neurips.cc

One more step towards reality: Cooperative bandits with imperfect communication

U Madhushani, A Dubey, N Leonard… - Advances in Neural …, 2021 - proceedings.neurips.cc

The cooperative bandit problem is increasingly becoming relevant due to its applications in
large-scale decision-making. However, most research for this problem focuses exclusively …

被引用次数：25 相关文章所有 10 个版本

[PDF] neurips.cc

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Online and distribution-free robustness: Regression and contextual bandits with huber contamination

S Chen, F Koehler, A Moitra… - 2021 IEEE 62nd Annual …, 2022 - ieeexplore.ieee.org

In this work we revisit two classic high-dimensional online learning problems, namely linear
regression and contextual bandits, from the perspective of adversarial robustness. Existing …

被引用次数：37 相关文章所有 5 个版本

[PDF] mlr.press

Best of both worlds: Stochastic & adversarial best-arm identification

Y Abbasi-Yadkori, P Bartlett… - … on learning theory, 2018 - proceedings.mlr.press

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A
simple random uniform learner obtains the optimal rate of error in the adversarial scenario …

被引用次数：58 相关文章所有 18 个版本