Importance sampling techniques for policy optimization

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：51 相关文章所有 11 个版本

[PDF] neurips.cc

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc

Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

被引用次数：34 相关文章所有 12 个版本

[PDF] mlr.press

Offline reinforcement learning with closed-form policy improvement operators

J Li, E Zhang, M Yin, Q Bai, YX Wang… - … on Machine Learning, 2023 - proceedings.mlr.press

Behavior constrained policy optimization has been demonstrated to be a successful
paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a …

被引用次数：7 相关文章所有 8 个版本

[PDF] neurips.cc

Off-policy evaluation with deficient support using side information

N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc

Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …

被引用次数：8 相关文章所有 9 个版本

[PDF] mlr.press

Inferring smooth control: Monte carlo posterior policy iteration with gaussian processes

J Watson, J Peters - Conference on Robot Learning, 2023 - proceedings.mlr.press

Monte Carlo methods have become increasingly relevant for control of non-differentiable
systems, approximate dynamics models, and learning from data. These methods scale to …

被引用次数：8 相关文章所有 6 个版本

[PDF] mlr.press

On the relation between policy improvement and off-policy minimum-variance policy evaluation

AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …

被引用次数：2 相关文章所有 9 个版本

[HTML] mdpi.com

[HTML][HTML] Identification of efficient sampling techniques for probabilistic voltage stability analysis of renewable-rich power systems

M Alzubaidi, KN Hasan, L Meegahapola, MT Rahman - Energies, 2021 - mdpi.com

This paper presents a comparative analysis of six sampling techniques to identify an efficient
and accurate sampling technique to be applied to probabilistic voltage stability assessment …

被引用次数：23 相关文章所有 6 个版本

Training Recommenders Over Large Item Corpus With Importance Sampling

D Lian, Z Gao, X Song, Y Li, Q Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

By predicting a personalized ranking on a set of items, item recommendation helps users
determine the information they need. While optimizing a ranking-focused loss is more in line …

被引用次数：2 相关文章所有 2 个版本

[PDF] aaai.org

Lifelong hyper-policy optimization with multiple importance sampling regularization

P Liotet, F Vidaich, AM Metelli, M Restelli - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for
current reinforcement learning algorithms. Yet this would be a much needed feature for …

被引用次数：11 相关文章所有 8 个版本

IWDA: Importance weighting for drift adaptation in streaming supervised learning problems

F Fedeli, AM Metelli, F Trovò… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Distribution drift is an important issue for practical applications of machine learning (ML). In
particular, in streaming ML, the data distribution may change over time, yielding the problem …

被引用次数：4 相关文章所有 4 个版本