High-dimensional sparse linear bandits

J Hu, X Chen, C Jin, L Li… - … Conference on Machine …, 2021 - proceedings.mlr.press

This paper studies representation learning for multi-task linear bandits and multi-task
episodic RL with linear value function approximation. We first consider the setting where we …

被引用次数：51 相关文章所有 6 个版本

[PDF] neurips.cc

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

被引用次数：42 相关文章所有 8 个版本

[PDF] mlr.press

Neural contextual bandits without regret

P Kassraie, A Krause - International Conference on Artificial …, 2022 - proceedings.mlr.press

Contextual bandits are a rich model for sequential decision making given side information,
with important applications, eg, in recommender systems. We propose novel algorithms for …

被引用次数：35 相关文章所有 8 个版本

[PDF] neurips.cc

Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits

K Jang, C Zhang, KS Jun - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …

被引用次数：14 相关文章所有 7 个版本

[PDF] neurips.cc

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc

A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

被引用次数：2 相关文章所有 6 个版本

[PDF] mlr.press

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press

Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

被引用次数：17 相关文章所有 4 个版本

[PDF] mlr.press

Multi-task representation learning with stochastic linear bandits

L Cella, K Lounici, G Pacreau… - … Conference on Artificial …, 2023 - proceedings.mlr.press

We study the problem of transfer-learning in the setting of stochastic linear contextual bandit
tasks. We consider that a low dimensional linear representation is shared across the tasks …

被引用次数：21 相关文章所有 6 个版本

[PDF] mlr.press

A Doubly Robust Approach to Sparse Reinforcement Learning

W Kim, G Iyengar, A Zeevi - International Conference on …, 2024 - proceedings.mlr.press

We propose a new regret minimization algorithm for episodic sparse linear Markov decision
process (SMDP) where the state-transition distribution is a linear function of observed …

被引用次数：2 相关文章所有 3 个版本

[PDF] mlr.press

A simple unified framework for high dimensional bandit problems

W Li, A Barik, J Honorio - International Conference on …, 2022 - proceedings.mlr.press

Stochastic high dimensional bandit problems with low dimensional structures are useful in
different applications such as online advertising and drug discovery. In this work, we …

被引用次数：27 相关文章所有 4 个版本

[PDF] neurips.cc

Anytime model selection in linear bandits

P Kassraie, N Emmenegger… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Model selection in the context of bandit optimization is a challenging problem, as it
requires balancing exploration and exploitation not only for action selection, but also for …

被引用次数：4 相关文章所有 10 个版本