Non-stationary representation learning in sequential linear bandits
In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …
stationary environments. We consider the framework of sequential linear bandits, where the …
Stochastic contextual bandits with long horizon rewards
The growing interest in complex decision-making and language modeling problems
highlights the importance of sample-efficient learning over very long horizons. This work …
highlights the importance of sample-efficient learning over very long horizons. This work …
An Adaptive Method for Non-Stationary Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System
Online decision-making can be formulated as the popular stochastic multi-armed bandit
problem where a learner makes decisions (or takes actions) to maximize cumulative …
problem where a learner makes decisions (or takes actions) to maximize cumulative …