Online learning: A comprehensive survey
Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …
to tackle some predictive (or any type of decision-making) task by learning from a sequence …
Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies
This chapter studies the bidirectional causal interactions between curiosity and learning and
discusses how understanding these interactions can be leveraged in educational …
discusses how understanding these interactions can be leveraged in educational …
[PDF][PDF] International conference on machine learning
W Li, C Wang, G Cheng, Q Song - Transactions on machine learning …, 2023 - par.nsf.gov
In this paper, we make the key delineation on the roles of resolution and statistical
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …
Uncertainty-based offline reinforcement learning with diversified q-ensemble
Offline reinforcement learning (offline RL), which aims to find an optimal policy from a
previously collected static dataset, bears algorithmic difficulties due to function …
previously collected static dataset, bears algorithmic difficulties due to function …
Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Hawkeye: Towards a desired directed grey-box fuzzer
Grey-box fuzzing is a practically effective approach to test real-world programs. However,
most existing grey-box fuzzers lack directedness, ie the capability of executing towards user …
most existing grey-box fuzzers lack directedness, ie the capability of executing towards user …
Time-uniform, nonparametric, nonasymptotic confidence sequences
Time-uniform, nonparametric, nonasymptotic confidence sequences Page 1 The Annals of
Statistics 2021, Vol. 49, No. 2, 1055–1080 https://doi.org/10.1214/20-AOS1991 © Institute of …
Statistics 2021, Vol. 49, No. 2, 1055–1080 https://doi.org/10.1214/20-AOS1991 © Institute of …