Scalable representation learning in linear contextual bandits with constant regret guarantees

文章

学术资源搜索

获得 3 条结果（用时0.01秒）

我的图书馆

Scalable representation learning in linear contextual bandits with constant regret guarantees

在引用文章中搜索

[PDF] mlr.press

On the complexity of representation learning in contextual linear bandits

A Tirinzoni, M Pirotta, A Lazaric - … Conference on Artificial …, 2023 - proceedings.mlr.press

In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Representation Abstractions as Incentives for Reinforcement Learning Agents: A Robotic Grasping Case Study

P Petropoulakis, L Gräf, J Josifovski, M Malmir… - arXiv preprint arXiv …, 2023 - arxiv.org

Choosing an appropriate representation of the environment for the underlying decision-
making process of the\gls {RL} agent is not always straightforward. The state representation …

Bounded (o (1)) regret recommendation learning via synthetic controls oracle

EH Kang, PR Kumar - 2023 59th Annual Allerton Conference …, 2023 - ieeexplore.ieee.org

In online exploration systems where users with fixed preferences repeatedly arrive, it has
recently been shown that O (1), ie, bounded regret, can be achieved when the system is …