Nearly optimal algorithms for linear contextual bandits with adversarial corruptions

J He, D Zhou, T Zhang, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

Metadata-based multi-task bandits with bayesian hierarchical models

R Wan, L Ge, R Song - Advances in Neural Information …, 2021 - proceedings.neurips.cc
How to explore efficiently is a central problem in multi-armed bandits. In this paper, we
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

Corralling a larger band of bandits: A case study on switching regret for linear bandits

H Luo, M Zhang, P Zhao… - Conference on Learning …, 2022 - proceedings.mlr.press
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …

Contextual bandits in a survey experiment on charitable giving: Within-experiment outcomes versus policy learning

S Athey, U Byambadalai, V Hadad… - arXiv preprint arXiv …, 2022 - arxiv.org
We design and implement an adaptive experiment (a``contextual bandit'') to learn a targeted
treatment assignment policy, where the goal is to use a participant's survey responses to …

Flexible and efficient contextual bandits with heterogeneous treatment effect oracles

AG Carranza, SK Krishnamurthy… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Contextual bandit algorithms often estimate reward models to inform decision-making.
However, true rewards can contain action-independent redundancies that are not relevant …

Harnessing the Power of Federated Learning in Federated Contextual Bandits

C Shi, R Zhou, K Yang, C Shen - arXiv preprint arXiv:2312.16341, 2023 - arxiv.org
Federated learning (FL) has demonstrated great potential in revolutionizing distributed
machine learning, and tremendous efforts have been made to extend it beyond the original …

Robust causal bandits for linear models

Z Yan, A Mukherjee, B Varıcı… - IEEE Journal on Selected …, 2024 - ieeexplore.ieee.org
The sequential design of experiments for optimizing a reward function in causal systems can
be effectively modeled by the sequential design of interventions in causal bandits (CBs). In …