Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee
Existing performance measures for bandit algorithms such as regret, PAC bounds, or
uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while …
uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while …
Sequential causal inference in a single world of connected units
We consider adaptive designs for a trial involving N individuals that we follow along T time
steps. We allow for the variables of one individual to depend on its past and on the past of …
steps. We allow for the variables of one individual to depend on its past and on the past of …
Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning
Existing metrics for reinforcement learning (RL) such as regret, PAC bounds, or uniform-PAC
(Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of …
(Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of …