Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Subgaussian and differentiable importance sampling for off-policy evaluation and learning
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …
estimation and learning algorithms. However, empirical and theoretical studies have …
Offline reinforcement learning with closed-form policy improvement operators
Behavior constrained policy optimization has been demonstrated to be a successful
paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a …
paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a …
Off-policy evaluation with deficient support using side information
N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc
Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …
new policies from the data collected by another one. OPE is crucial when evaluating a new …
Inferring smooth control: Monte carlo posterior policy iteration with gaussian processes
Monte Carlo methods have become increasingly relevant for control of non-differentiable
systems, approximate dynamics models, and learning from data. These methods scale to …
systems, approximate dynamics models, and learning from data. These methods scale to …
On the relation between policy improvement and off-policy minimum-variance policy evaluation
AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …
[HTML][HTML] Identification of efficient sampling techniques for probabilistic voltage stability analysis of renewable-rich power systems
This paper presents a comparative analysis of six sampling techniques to identify an efficient
and accurate sampling technique to be applied to probabilistic voltage stability assessment …
and accurate sampling technique to be applied to probabilistic voltage stability assessment …
Training Recommenders Over Large Item Corpus With Importance Sampling
By predicting a personalized ranking on a set of items, item recommendation helps users
determine the information they need. While optimizing a ranking-focused loss is more in line …
determine the information they need. While optimizing a ranking-focused loss is more in line …
Lifelong hyper-policy optimization with multiple importance sampling regularization
Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for
current reinforcement learning algorithms. Yet this would be a much needed feature for …
current reinforcement learning algorithms. Yet this would be a much needed feature for …
IWDA: Importance weighting for drift adaptation in streaming supervised learning problems
Distribution drift is an important issue for practical applications of machine learning (ML). In
particular, in streaming ML, the data distribution may change over time, yielding the problem …
particular, in streaming ML, the data distribution may change over time, yielding the problem …