On the complexity of representation learning in contextual linear bandits
In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …
unknown reward vector and a given embedding of context-arm pairs. In practice, the …
Representation Abstractions as Incentives for Reinforcement Learning Agents: A Robotic Grasping Case Study
Choosing an appropriate representation of the environment for the underlying decision-
making process of the\gls {RL} agent is not always straightforward. The state representation …
making process of the\gls {RL} agent is not always straightforward. The state representation …
Bounded (o (1)) regret recommendation learning via synthetic controls oracle
In online exploration systems where users with fixed preferences repeatedly arrive, it has
recently been shown that O (1), ie, bounded regret, can be achieved when the system is …
recently been shown that O (1), ie, bounded regret, can be achieved when the system is …