Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

J Liu, Y Li, L Yang - arXiv preprint arXiv:2402.12711, 2024 - arxiv.org
Existing performance measures for bandit algorithms such as regret, PAC bounds, or
uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while …

Sequential causal inference in a single world of connected units

A Bibaut, M Petersen, N Vlassis… - arXiv preprint arXiv …, 2021 - arxiv.org
We consider adaptive designs for a trial involving N individuals that we follow along T time
steps. We allow for the variables of one individual to depend on its past and on the past of …

Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning

J Liu, Y Li, R Wang, L Yang - The Thirty-eighth Annual Conference on … - openreview.net
Existing metrics for reinforcement learning (RL) such as regret, PAC bounds, or uniform-PAC
(Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of …