Tight regret bounds for stochastic combinatorial semi-bandits

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2942 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1083 相关文章所有 7 个版本

[PDF] mlr.press

Cascading bandits: Learning to rank in the cascade model

B Kveton, C Szepesvari, Z Wen… - … conference on machine …, 2015 - proceedings.mlr.press

A search engine usually outputs a list of K web pages. The user examines this list, from the
first web page to the last, and chooses the first attractive page. This model of user behavior …

被引用次数：319 相关文章所有 15 个版本

[PDF] neurips.cc

Off-policy evaluation for slate recommendation

A Swaminathan, A Krishnamurthy… - Advances in …, 2017 - proceedings.neurips.cc

This paper studies the evaluation of policies that recommend an ordered set of items (eg, a
ranking) based on some context---a common scenario in web search, ads, and …

被引用次数：228 相关文章所有 8 个版本

[PDF] jmlr.org

Combinatorial multi-armed bandit and its extension to probabilistically triggered arms

W Chen, Y Wang, Y Yuan, Q Wang - Journal of Machine Learning …, 2016 - jmlr.org

In the past few years, differential privacy has become a standard concept in the area of
privacy. One of the most important problems in this field is to answer queries while …

被引用次数：250 相关文章所有 10 个版本

[PDF] mlr.press

Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism

WC Cheung, D Simchi-Levi… - … conference on machine …, 2020 - proceedings.mlr.press

We consider un-discounted reinforcement learning (RL) in Markov decision processes
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …

被引用次数：107 相关文章所有 7 个版本

[PDF] neurips.cc

Combinatorial bandits revisited

R Combes… - Advances in neural …, 2015 - proceedings.neurips.cc

This paper investigates stochastic and adversarial combinatorial multi-armed bandit
problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific …

被引用次数：262 相关文章所有 23 个版本

[PDF] mlr.press

Thompson sampling for combinatorial semi-bandits

S Wang, W Chen - International Conference on Machine …, 2018 - proceedings.mlr.press

We study the application of the Thompson sampling (TS) methodology to the stochastic
combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm …

被引用次数：139 相关文章所有 5 个版本

[PDF] mlr.press

Hierarchical bayesian bandits

J Hong, B Kveton, M Zaheer… - International …, 2022 - proceedings.mlr.press

Abstract Meta-, multi-task, and federated learning can be all viewed as solving similar tasks,
drawn from a distribution that reflects task similarities. We provide a unified view of all these …

被引用次数：40 相关文章所有 4 个版本

[PDF] arxiv.org

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

被引用次数：113 相关文章所有 11 个版本