Online learning with off-policy feedback

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Online learning with off-policy feedback

在引用文章中搜索

[PDF] mlr.press

Offline primal-dual reinforcement learning for linear mdps

G Gabbianelli, G Neu, M Papini… - … Conference on Artificial …, 2024 - proceedings.mlr.press

Abstract Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed
dataset of transitions collected by another policy. This problem has attracted a lot of attention …

被引用次数：8 相关文章所有 6 个版本

[PDF] mlr.press

Importance-weighted offline learning done right

G Gabbianelli, G Neu, M Papini - … Conference on Algorithmic …, 2024 - proceedings.mlr.press

We study the problem of offline policy optimization in stochastic contextual bandit problems,
where the goal is to learn a near-optimal policy based on a dataset of decision data …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Pure Exploration under Mediators' Feedback

R Poiani, AM Metelli, M Restelli - arXiv preprint arXiv:2308.15552, 2023 - arxiv.org

Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each
interaction step, the learner selects an arm and observes a stochastic reward. Within the …

被引用次数：1 相关文章所有 3 个版本

[PDF] ijcai.org

[PDF][PDF] Online Learning with Off-Policy Feedback in Adversarial MDPs

F Bacchiocchi, FE Stradi, M Papini, AM Metelli, N Gatti - ijcai.org

In this paper, we face the challenge of online learning in adversarial Markov decision
processes with off-policy feedback. In this setting, the learner chooses a policy, but …