The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
The role of coverage in online reinforcement learning
Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …
state space--play a fundamental role in determining the sample complexity of offline …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
Regret minimization via saddle point optimization
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …
decision-making by min-max programs. In the corresponding saddle-point game, the min …
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …
systems towards the intended goals, ethical principles, and values of individuals and …
On the complexity of representation learning in contextual linear bandits
In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …
unknown reward vector and a given embedding of context-arm pairs. In practice, the …
Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret
While quantum reinforcement learning (RL) has attracted a surge of attention recently, its
theoretical understanding is limited. In particular, it remains elusive how to design provably …
theoretical understanding is limited. In particular, it remains elusive how to design provably …
Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards
X Li, Q Sun - arXiv preprint arXiv:2303.05606, 2023 - arxiv.org
This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-
making in the presence of heavy-tailed rewards with only finite variances. For linear …
making in the presence of heavy-tailed rewards with only finite variances. For linear …
Bandits with Multimodal Structure
H Saber, OA Maillard - Reinforcement Learning Conference, 2024 - inria.hal.science
We consider a multi-armed bandit problem specified by a set of one-dimensional
exponential family distributions endowed with a multimodal structure. The multimodal …
exponential family distributions endowed with a multimodal structure. The multimodal …
A Theory of Active Learning in Dynamic Environments
A Wagenmaker - 2024 - search.proquest.com
How should an agent interact with an unknown, dynamic environment in order to collect the
information necessary to accomplish its goals? This question is central to the operation of …
information necessary to accomplish its goals? This question is central to the operation of …