Near-optimal representation learning for linear bandits and linear rl
This paper studies representation learning for multi-task linear bandits and multi-task
episodic RL with linear value function approximation. We first consider the setting where we …
episodic RL with linear value function approximation. We first consider the setting where we …
Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …
function approximations. We propose to study convergence to approximate local maxima …
Neural contextual bandits without regret
P Kassraie, A Krause - International Conference on Artificial …, 2022 - proceedings.mlr.press
Contextual bandits are a rich model for sequential decision making given side information,
with important applications, eg, in recommender systems. We propose novel algorithms for …
with important applications, eg, in recommender systems. We propose novel algorithms for …
Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits
In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …
set and receives reward feedback, and the reward function depends linearly on a few …
Regret minimization via saddle point optimization
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …
decision-making by min-max programs. In the corresponding saddle-point game, the min …
Contextual information-directed sampling
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
Multi-task representation learning with stochastic linear bandits
We study the problem of transfer-learning in the setting of stochastic linear contextual bandit
tasks. We consider that a low dimensional linear representation is shared across the tasks …
tasks. We consider that a low dimensional linear representation is shared across the tasks …
A Doubly Robust Approach to Sparse Reinforcement Learning
We propose a new regret minimization algorithm for episodic sparse linear Markov decision
process (SMDP) where the state-transition distribution is a linear function of observed …
process (SMDP) where the state-transition distribution is a linear function of observed …
A simple unified framework for high dimensional bandit problems
Stochastic high dimensional bandit problems with low dimensional structures are useful in
different applications such as online advertising and drug discovery. In this work, we …
different applications such as online advertising and drug discovery. In this work, we …
Anytime model selection in linear bandits
P Kassraie, N Emmenegger… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Model selection in the context of bandit optimization is a challenging problem, as it
requires balancing exploration and exploitation not only for action selection, but also for …
requires balancing exploration and exploitation not only for action selection, but also for …