Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

H Liu, CY Wei, J Zimmert - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …

Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation

Y Wang, Q Liu, Y Bai, C Jin - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the\emph {curse of
multiagency}, where the description length of the game as well as the complexity of many …

Misspecified gaussian process bandit optimization

I Bogunovic, A Krause - Advances in neural information …, 2021 - proceedings.neurips.cc
We consider the problem of optimizing a black-box function based on noisy bandit feedback.
Kernelized bandit algorithms have shown strong empirical and theoretical performance for …

Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation

Q Cui, K Zhang, S Du - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
We propose a new model,\emph {independent linear Markov game}, for multi-agent
reinforcement learning with a large state space and a large number of agents. This is a class …

Stochastic linear bandits robust to adversarial attacks

I Bogunovic, A Losalka, A Krause… - International …, 2021 - proceedings.mlr.press
We consider a stochastic linear bandit problem in which the rewards are not only subject to
random noise, but also adversarial attacks subject to a suitable budget $ C $(ie, an upper …

Refined regret for adversarial mdps with linear function approximation

Y Dai, H Luo, CY Wei, J Zimmert - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider learning in an adversarial Markov Decision Process (MDP) where the loss
functions can change arbitrarily over $ K $ episodes and the state space can be arbitrarily …

Improved regret for efficient online reinforcement learning with linear function approximation

U Sherman, T Koren… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study reinforcement learning with linear function approximation and adversarially
changing cost functions, a setup that has mostly been considered under simplifying …

First-and second-order bounds for adversarial linear contextual bandits

J Olkhovskaya, J Mayo, T van Erven… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the adversarial linear contextual bandit setting, whichallows for the loss
functions associated with each of $ K $ arms to changeover time without restriction …

Offline primal-dual reinforcement learning for linear mdps

G Gabbianelli, G Neu, M Papini… - … Conference on Artificial …, 2024 - proceedings.mlr.press
Abstract Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed
dataset of transitions collected by another policy. This problem has attracted a lot of attention …