Near-optimal representation learning for linear bandits and linear rl

J Hu, X Chen, C Jin, L Li… - … Conference on Machine …, 2021 - proceedings.mlr.press
This paper studies representation learning for multi-task linear bandits and multi-task
episodic RL with linear value function approximation. We first consider the setting where we …

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

Neural contextual bandits without regret

P Kassraie, A Krause - International Conference on Artificial …, 2022 - proceedings.mlr.press
Contextual bandits are a rich model for sequential decision making given side information,
with important applications, eg, in recommender systems. We propose novel algorithms for …

Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits

K Jang, C Zhang, KS Jun - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

Multi-task representation learning with stochastic linear bandits

L Cella, K Lounici, G Pacreau… - … Conference on Artificial …, 2023 - proceedings.mlr.press
We study the problem of transfer-learning in the setting of stochastic linear contextual bandit
tasks. We consider that a low dimensional linear representation is shared across the tasks …

A Doubly Robust Approach to Sparse Reinforcement Learning

W Kim, G Iyengar, A Zeevi - International Conference on …, 2024 - proceedings.mlr.press
We propose a new regret minimization algorithm for episodic sparse linear Markov decision
process (SMDP) where the state-transition distribution is a linear function of observed …

A simple unified framework for high dimensional bandit problems

W Li, A Barik, J Honorio - International Conference on …, 2022 - proceedings.mlr.press
Stochastic high dimensional bandit problems with low dimensional structures are useful in
different applications such as online advertising and drug discovery. In this work, we …

Anytime model selection in linear bandits

P Kassraie, N Emmenegger… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Model selection in the context of bandit optimization is a challenging problem, as it
requires balancing exploration and exploitation not only for action selection, but also for …