Asymptotic instance-optimal algorithms for interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：181 相关文章所有 6 个版本

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade - arXiv preprint arXiv …, 2022 - arxiv.org

Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

被引用次数：63 相关文章所有 4 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：16 相关文章所有 3 个版本

[PDF] neurips.cc

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc

A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

H Shen, T Knearem, R Ghosh, K Alkiek… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …

被引用次数：7 相关文章所有 3 个版本

[PDF] mlr.press

On the complexity of representation learning in contextual linear bandits

A Tirinzoni, M Pirotta, A Lazaric - … Conference on Artificial …, 2023 - proceedings.mlr.press

In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret

H Zhong, J Hu, Y Xue, T Li, L Wang - arXiv preprint arXiv:2302.10796, 2023 - arxiv.org

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its
theoretical understanding is limited. In particular, it remains elusive how to design provably …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org