Exploration–exploitation tradeoff using variance estimates in multi-armed bandits

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier

Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

被引用次数：867 相关文章所有 6 个版本

[PDF] hal.science

Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies

PY Oudeyer, J Gottlieb, M Lopes - Progress in brain research, 2016 - Elsevier

This chapter studies the bidirectional causal interactions between curiosity and learning and
discusses how understanding these interactions can be leveraged in educational …

被引用次数：512 相关文章所有 13 个版本

[PDF] nsf.gov

[PDF][PDF] International conference on machine learning

W Li, C Wang, G Cheng, Q Song - Transactions on machine learning …, 2023 - par.nsf.gov

In this paper, we make the key delineation on the roles of resolution and statistical
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …

被引用次数：1688 相关文章

[PDF] neurips.cc

Uncertainty-based offline reinforcement learning with diversified q-ensemble

G An, S Moon, JH Kim… - Advances in neural …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a
previously collected static dataset, bears algorithmic difficulties due to function …

被引用次数：293 相关文章所有 7 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：239 相关文章所有 7 个版本

[PDF] mlr.press

Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning

K Lee, M Laskin, A Srinivas… - … Conference on Machine …, 2021 - proceedings.mlr.press

Off-policy deep reinforcement learning (RL) has been successful in a range of challenging
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …

被引用次数：255 相关文章所有 6 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3226 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1220 相关文章所有 7 个版本

[PDF] smu.edu.sg

Hawkeye: Towards a desired directed grey-box fuzzer

H Chen, Y Xue, Y Li, B Chen, X Xie, X Wu… - Proceedings of the 2018 …, 2018 - dl.acm.org

Grey-box fuzzing is a practically effective approach to test real-world programs. However,
most existing grey-box fuzzers lack directedness, ie the capability of executing towards user …

被引用次数：324 相关文章所有 7 个版本

[PDF] projecteuclid.org

Time-uniform, nonparametric, nonasymptotic confidence sequences

SR Howard, A Ramdas, J McAuliffe, J Sekhon - 2021 - projecteuclid.org

Time-uniform, nonparametric, nonasymptotic confidence sequences Page 1 The Annals of
Statistics 2021, Vol. 49, No. 2, 1055–1080 https://doi.org/10.1214/20-AOS1991 © Institute of …

被引用次数：316 相关文章所有 7 个版本