Provable benefit of multitask representation learning in reinforcement learning

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

Meta Learning in Bandits within shared affine Subspaces

S Bilaj, S Dhouib, S Maghsudi - International Conference on …, 2024 - proceedings.mlr.press
We study the problem of meta-learning several contextual stochastic bandits tasks by
leveraging their concentration around a low dimensional affine subspace, which we learn …

Meta-learning adversarial bandits

MF Balcan, K Harris, M Khodak, ZS Wu - arXiv preprint arXiv:2205.14128, 2022 - arxiv.org
We study online learning with bandit feedback across multiple tasks, with the goal of
improving average performance across tasks if they are similar according to some natural …

Lifelong Best-Arm Identification with Misspecified Priors

N Nguyen, C Vernade - Sixteenth European Workshop on …, 2023 - openreview.net
We address the problem of lifelong fixed-budget best-arm identification (BAI), which arises in
realistic sequential A/B testing scenarios where the value of each arm is correlated across …

Online meta-learning in adversarial multi-armed bandits

I Osadchiy, KY Levy, R Meir - arXiv preprint arXiv:2205.15921, 2022 - arxiv.org
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-
online setup, in which a player (learner) encounters a sequence of multi-armed bandit …

Transfer learning in bandits with latent continuity

H Park, S Shin, KS Jun, J Ok - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
A continuity structure of correlations among arms in multi-armed bandit can bring a
significant acceleration of exploration and reduction of regret, in particular, when there are …

Beyond task diversity: provable representation transfer for sequential multitask linear bandits

T Duong, Z Wang, C Zhang - The Thirty-eighth Annual Conference on … - openreview.net
We study lifelong learning in linear bandits, where a learner interacts with a sequence of
linear bandit tasks whose parameters lie in an $ m $-dimensional subspace of $\mathbb …