Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Bypassing the simulator: Near-optimal adversarial linear contextual bandits
We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation
A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the\emph {curse of
multiagency}, where the description length of the game as well as the complexity of many …
multiagency}, where the description length of the game as well as the complexity of many …
Misspecified gaussian process bandit optimization
I Bogunovic, A Krause - Advances in neural information …, 2021 - proceedings.neurips.cc
We consider the problem of optimizing a black-box function based on noisy bandit feedback.
Kernelized bandit algorithms have shown strong empirical and theoretical performance for …
Kernelized bandit algorithms have shown strong empirical and theoretical performance for …
Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation
We propose a new model,\emph {independent linear Markov game}, for multi-agent
reinforcement learning with a large state space and a large number of agents. This is a class …
reinforcement learning with a large state space and a large number of agents. This is a class …
Stochastic linear bandits robust to adversarial attacks
We consider a stochastic linear bandit problem in which the rewards are not only subject to
random noise, but also adversarial attacks subject to a suitable budget $ C $(ie, an upper …
random noise, but also adversarial attacks subject to a suitable budget $ C $(ie, an upper …
Refined regret for adversarial mdps with linear function approximation
We consider learning in an adversarial Markov Decision Process (MDP) where the loss
functions can change arbitrarily over $ K $ episodes and the state space can be arbitrarily …
functions can change arbitrarily over $ K $ episodes and the state space can be arbitrarily …
Improved regret for efficient online reinforcement learning with linear function approximation
We study reinforcement learning with linear function approximation and adversarially
changing cost functions, a setup that has mostly been considered under simplifying …
changing cost functions, a setup that has mostly been considered under simplifying …
First-and second-order bounds for adversarial linear contextual bandits
We consider the adversarial linear contextual bandit setting, whichallows for the loss
functions associated with each of $ K $ arms to changeover time without restriction …
functions associated with each of $ K $ arms to changeover time without restriction …
Offline primal-dual reinforcement learning for linear mdps
Abstract Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed
dataset of transitions collected by another policy. This problem has attracted a lot of attention …
dataset of transitions collected by another policy. This problem has attracted a lot of attention …