Methods for improving the availability of spot instances: A survey

L Lin, L Pan, S Liu - Computers in Industry, 2022 - Elsevier
The burgeoning development of the cloud market has promoted the expansion of resources
held by cloud providers, but the resulting underutilization caused by the over-provisioned …

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press
Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

Instance-dependent near-optimal policy identification in linear mdps via online experiment design

A Wagenmaker, KG Jamieson - Advances in Neural …, 2022 - proceedings.neurips.cc
While much progress has been made in understanding the minimax sample complexity of
reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …

Adaptive reward-free exploration

E Kaufmann, P Ménard… - Algorithmic …, 2021 - proceedings.mlr.press
Reward-free exploration is a reinforcement learning setting recently studied by (Jin et al.
2020), who address it by running several algorithms with regret guarantees in parallel. In our …

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

Towards theoretical understanding of inverse reinforcement learning

AM Metelli, F Lazzati, M Restelli - … Conference on Machine …, 2023 - proceedings.mlr.press
Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a
reward function justifying the behavior demonstrated by an expert agent. A well-known …

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press
The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc
We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …