Safe exploration for efficient policy evaluation and comparison
High-quality data plays a central role in ensuring the accuracy of policy evaluation. This
paper initiates the study of efficient and safe data collection for bandit policy evaluation. We …
paper initiates the study of efficient and safe data collection for bandit policy evaluation. We …
Safe optimal design with applications in off-policy learning
Motivated by practical needs in online experimentation and off-policy learning, we study the
problem of safe optimal design, where we develop a data logging policy that efficiently …
problem of safe optimal design, where we develop a data logging policy that efficiently …