Behaviour policy estimation in off-policy policy evaluation: Calibration matters

C Yu, J Liu, S Nemati, G Yin - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision
making by using interaction samples of an agent with its environment and the potentially …

被引用次数：608 相关文章所有 5 个版本

[PDF] nature.com

Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care

A Peine, A Hallawa, J Bickenbach, G Dartmann… - NPJ digital …, 2021 - nature.com

The aim of this work was to develop and evaluate the reinforcement learning algorithm
VentAI, which is able to suggest a dynamically optimized mechanical ventilation regime for …

被引用次数：69 相关文章所有 11 个版本

[PDF] neurips.cc

Counterfactual data augmentation using locally factored dynamics

S Pitis, E Creager, A Garg - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Many dynamic processes, including common scenarios in robotic control and reinforcement
learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not …

被引用次数：81 相关文章所有 6 个版本

[PDF] mlr.press

Near-optimal provable uniform convergence in offline policy evaluation for reinforcement learning

M Yin, Y Bai, YX Wang - International Conference on …, 2021 - proceedings.mlr.press

The problem of\emph {Offline Policy Evaluation}(OPE) in Reinforcement Learning (RL) is a
critical step towards applying RL in real life applications. Existing work on OPE mostly focus …

被引用次数：68 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating the robustness of off-policy evaluation

Y Saito, T Udagawa, H Kiyohara, K Mogi… - Proceedings of the 15th …, 2021 - dl.acm.org

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of
hypothetical policies leveraging only offline log data. It is particularly useful in applications …

被引用次数：33 相关文章所有 10 个版本

[PDF] arxiv.org

Popcorn: Partially observed prediction constrained reinforcement learning

J Futoma, MC Hughes, F Doshi-Velez - arXiv preprint arXiv:2001.04032, 2020 - arxiv.org

Many medical decision-making tasks can be framed as partially observed Markov decision
processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP …

被引用次数：51 相关文章所有 6 个版本

[PDF] springer.com

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer

In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

被引用次数：31 相关文章所有 13 个版本

Reinforcement learning in medical diagnosis: An overview

R Khajuria, A Sarwar - Recent Innovations in Computing: Proceedings of …, 2022 - Springer

The paper provides the readers with the knowledge of how reinforcement learning (RL)
applications can be applied in medical diagnosis and healthcare. RL is a powerful and …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Model-based reinforcement learning for sepsis treatment

A Raghu, M Komorowski, S Singh - arXiv preprint arXiv:1811.09602, 2018 - arxiv.org

Sepsis is a dangerous condition that is a leading cause of patient mortality. Treating sepsis
is highly challenging, because individual patients respond very differently to medical …

被引用次数：50 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis

L Roggeveen, A El Hassouni, J Ahrendt, T Guo… - Artificial Intelligence in …, 2021 - Elsevier

Introduction In recent years, reinforcement learning (RL) has gained traction in the
healthcare domain. In particular, RL methods have been explored for haemodynamic …

被引用次数：28 相关文章所有 5 个版本