Non-stationary bandits and meta-learning with a small set of optimal arms

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

被引用次数：21 相关文章所有 6 个版本

[PDF] mlr.press

Meta Learning in Bandits within shared affine Subspaces

S Bilaj, S Dhouib, S Maghsudi - International Conference on …, 2024 - proceedings.mlr.press

We study the problem of meta-learning several contextual stochastic bandits tasks by
leveraging their concentration around a low dimensional affine subspace, which we learn …

Meta-learning adversarial bandits

MF Balcan, K Harris, M Khodak, ZS Wu - arXiv preprint arXiv:2205.14128, 2022 - arxiv.org

We study online learning with bandit feedback across multiple tasks, with the goal of
improving average performance across tasks if they are similar according to some natural …

被引用次数：4 相关文章所有 3 个版本

[PDF] openreview.net

Lifelong Best-Arm Identification with Misspecified Priors

N Nguyen, C Vernade - Sixteenth European Workshop on …, 2023 - openreview.net

We address the problem of lifelong fixed-budget best-arm identification (BAI), which arises in
realistic sequential A/B testing scenarios where the value of each arm is correlated across …

被引用次数：1 相关文章

[PDF] arxiv.org

Online meta-learning in adversarial multi-armed bandits

I Osadchiy, KY Levy, R Meir - arXiv preprint arXiv:2205.15921, 2022 - arxiv.org

We study meta-learning for adversarial multi-armed bandits. We consider the online-within-
online setup, in which a player (learner) encounters a sequence of multi-armed bandit …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Transfer learning in bandits with latent continuity

H Park, S Shin, KS Jun, J Ok - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

A continuity structure of correlations among arms in multi-armed bandit can bring a
significant acceleration of exploration and reduction of regret, in particular, when there are …

被引用次数：1 相关文章所有 4 个版本

[PDF] openreview.net

Beyond task diversity: provable representation transfer for sequential multitask linear bandits

T Duong, Z Wang, C Zhang - The Thirty-eighth Annual Conference on … - openreview.net

We study lifelong learning in linear bandits, where a learner interacts with a sequence of
linear bandit tasks whose parameters lie in an $ m $-dimensional subspace of $\mathbb …