Nearly optimal algorithms for linear contextual bandits with adversarial corruptions
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability
D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …
function approximations. We propose to study convergence to approximate local maxima …
Metadata-based multi-task bandits with bayesian hierarchical models
How to explore efficiently is a central problem in multi-armed bandits. In this paper, we
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …
Proportional response: Contextual bandits for simple and cumulative regret minimization
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …
Corralling a larger band of bandits: A case study on switching regret for linear bandits
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …
Contextual bandits in a survey experiment on charitable giving: Within-experiment outcomes versus policy learning
We design and implement an adaptive experiment (a``contextual bandit'') to learn a targeted
treatment assignment policy, where the goal is to use a participant's survey responses to …
treatment assignment policy, where the goal is to use a participant's survey responses to …
Flexible and efficient contextual bandits with heterogeneous treatment effect oracles
AG Carranza, SK Krishnamurthy… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Contextual bandit algorithms often estimate reward models to inform decision-making.
However, true rewards can contain action-independent redundancies that are not relevant …
However, true rewards can contain action-independent redundancies that are not relevant …
Harnessing the Power of Federated Learning in Federated Contextual Bandits
Federated learning (FL) has demonstrated great potential in revolutionizing distributed
machine learning, and tremendous efforts have been made to extend it beyond the original …
machine learning, and tremendous efforts have been made to extend it beyond the original …
Robust causal bandits for linear models
The sequential design of experiments for optimizing a reward function in causal systems can
be effectively modeled by the sequential design of interventions in causal bandits (CBs). In …
be effectively modeled by the sequential design of interventions in causal bandits (CBs). In …