Safe exploration for efficient policy evaluation and comparison

R Wan, B Kveton, R Song - International Conference on …, 2022 - proceedings.mlr.press
High-quality data plays a central role in ensuring the accuracy of policy evaluation. This
paper initiates the study of efficient and safe data collection for bandit policy evaluation. We …

Safe optimal design with applications in off-policy learning

R Zhu, B Kveton - International Conference on Artificial …, 2022 - proceedings.mlr.press
Motivated by practical needs in online experimentation and off-policy learning, we study the
problem of safe optimal design, where we develop a data logging policy that efficiently …