Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models L Szpruch, T Treetanthiploet, Y Zhang arXiv preprint arXiv:2112.10264, 2021 | 20 | 2021 |
Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning L Szpruch, T Treetanthiploet, Y Zhang SIAM Journal on Control and Optimization 62 (1), 135-166, 2024 | 9 | 2024 |
Asymptotic Randomised Control with applications to bandits SN Cohen, T Treetanthiploet arXiv preprint arXiv:2010.07252, 2020 | 8 | 2020 |
Gittins’ theorem under uncertainty SN Cohen, T Treetanthiploet Electronic Journal of Probability 27, 1-48, 2022 | 5 | 2022 |
Correlated bandits for dynamic pricing via the arc algorithm SN Cohen, T Treetanthiploet arXiv preprint arXiv:2102.04263 12, 2021 | 5 | 2021 |
Insurance pricing on price comparison websites via reinforcement learning T Treetanthiploet, Y Zhang, L Szpruch, I Bowers-Barnard, H Ridley, ... arXiv preprint arXiv:2308.06935, 2023 | 1 | 2023 |
Generalised correlated batched bandits via the ARC algorithm with application to dynamic pricing S Cohen, T Treetanthiploet arXiv preprint arXiv:2102.04263, 2021 | 1 | 2021 |
-Policy Gradient for Online Pricing L Szpruch, T Treetanthiploet, Y Zhang arXiv preprint arXiv:2405.03624, 2024 | | 2024 |
Competitive Insurance Pricing Using Model-Based Bandits L Sliwinski, T Treetanthiploet, D Siska, L Szpruch Available at SSRN 4755027, 2024 | | 2024 |
Correlated Bandits for Dynamic Pricing Via the Arc Algorithm T Treetanthiploet, SN Cohen Available at SSRN 3781766, 2021 | | 2021 |
Stochastic control approach to the multi-armed bandit problems T Treetanthiploet University of Oxford, 2021 | | 2021 |