Tracking most significant shifts in nonparametric contextual bandits

J Suk, S Kpotufe - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We study nonparametric contextual bandits where Lipschitz mean reward functions may
change over time. We first establish the minimax dynamic regret rate in this less understood …

A robust test for the stationarity assumption in sequential decision making

J Wang, C Shi, Z Wu - International Conference on Machine …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to
learn an optimal policy to maximize the expected return. The optimality of various RL …

Reaching goals is hard: Settling the sample complexity of the stochastic shortest path

L Chen, A Tirinzoni, M Pirotta… - … on Algorithmic Learning …, 2023 - proceedings.mlr.press
We study the sample complexity of learning an $\epsilon $-optimal policy in the Stochastic
Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner …

Layered state discovery for incremental autonomous exploration

L Chen, A Tirinzoni, A Lazaric… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this
setting, the objective is to discover a set of $\epsilon $-optimal policies reaching a set …

Hi-Core: Hierarchical Knowledge Transfer for Continual Reinforcement Learning

C Pan, X Yang, H Wang, W Wei, T Li - arXiv preprint arXiv:2401.15098, 2024 - arxiv.org
Continual reinforcement learning (CRL) empowers RL agents with the ability to learn from a
sequence of tasks, preserving previous knowledge and leveraging it to facilitate future …

A Unified Algorithm for Stochastic Path Problems

C Dann, CY Wei, J Zimmert - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning in stochastic path (SP) problems. The goal in these
problems is to maximize the expected sum of rewards until the agent reaches a terminal …

Advances in Non-stationary Sequential Decision-Making

J Suk - 2024 - search.proquest.com
We study the problem of sequential decision-making (eg multi-armed bandits, contextual
bandits, reinforcement learning) under changing environments, or distribution shifts. Ideally …