The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

The role of coverage in online reinforcement learning

T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade - arXiv preprint arXiv …, 2022 - arxiv.org
Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

H Shen, T Knearem, R Ghosh, K Alkiek… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …

On the complexity of representation learning in contextual linear bandits

A Tirinzoni, M Pirotta, A Lazaric - … Conference on Artificial …, 2023 - proceedings.mlr.press
In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …

Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret

H Zhong, J Hu, Y Xue, T Li, L Wang - arXiv preprint arXiv:2302.10796, 2023 - arxiv.org
While quantum reinforcement learning (RL) has attracted a surge of attention recently, its
theoretical understanding is limited. In particular, it remains elusive how to design provably …

Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards

X Li, Q Sun - arXiv preprint arXiv:2303.05606, 2023 - arxiv.org
This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-
making in the presence of heavy-tailed rewards with only finite variances. For linear …

Bandits with Multimodal Structure

H Saber, OA Maillard - Reinforcement Learning Conference, 2024 - inria.hal.science
We consider a multi-armed bandit problem specified by a set of one-dimensional
exponential family distributions endowed with a multimodal structure. The multimodal …

A Theory of Active Learning in Dynamic Environments

A Wagenmaker - 2024 - search.proquest.com
How should an agent interact with an unknown, dynamic environment in order to collect the
information necessary to accomplish its goals? This question is central to the operation of …