Adapting to misspecification in contextual bandits with offline regression oracles

J He, D Zhou, T Zhang, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …

被引用次数：43 相关文章所有 8 个版本

[PDF] arxiv.org

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org

We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

被引用次数：118 相关文章所有 9 个版本

[PDF] neurips.cc

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

被引用次数：40 相关文章所有 8 个版本

[PDF] neurips.cc

Metadata-based multi-task bandits with bayesian hierarchical models

R Wan, L Ge, R Song - Advances in Neural Information …, 2021 - proceedings.neurips.cc

How to explore efficiently is a central problem in multi-armed bandits. In this paper, we
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …

被引用次数：29 相关文章所有 6 个版本

[PDF] neurips.cc

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc

In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

被引用次数：7 相关文章所有 6 个版本

[PDF] mlr.press

Corralling a larger band of bandits: A case study on switching regret for linear bandits

H Luo, M Zhang, P Zhao… - Conference on Learning …, 2022 - proceedings.mlr.press

We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

Contextual bandits in a survey experiment on charitable giving: Within-experiment outcomes versus policy learning

S Athey, U Byambadalai, V Hadad… - arXiv preprint arXiv …, 2022 - arxiv.org

We design and implement an adaptive experiment (a``contextual bandit'') to learn a targeted
treatment assignment policy, where the goal is to use a participant's survey responses to …

被引用次数：11 相关文章所有 5 个版本

[PDF] mlr.press

Flexible and efficient contextual bandits with heterogeneous treatment effect oracles

AG Carranza, SK Krishnamurthy… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Contextual bandit algorithms often estimate reward models to inform decision-making.
However, true rewards can contain action-independent redundancies that are not relevant …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Harnessing the Power of Federated Learning in Federated Contextual Bandits

C Shi, R Zhou, K Yang, C Shen - arXiv preprint arXiv:2312.16341, 2023 - arxiv.org

Federated learning (FL) has demonstrated great potential in revolutionizing distributed
machine learning, and tremendous efforts have been made to extend it beyond the original …

被引用次数：1 相关文章所有 4 个版本

[PDF] rpi.edu

Robust causal bandits for linear models

Z Yan, A Mukherjee, B Varıcı… - IEEE Journal on Selected …, 2024 - ieeexplore.ieee.org

The sequential design of experiments for optimizing a reward function in causal systems can
be effectively modeled by the sequential design of interventions in causal bandits (CBs). In …

被引用次数：2 相关文章所有 2 个版本