Blackwell approachability and no-regret learning are equivalent

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

被引用次数：2002 相关文章所有 17 个版本

[PDF] mdpi.com

From cyber–physical convergence to digital twins: A review on edge computing use case designs

MC Hlophe, BT Maharaj - Applied Sciences, 2023 - mdpi.com

As a result of the new telecommunication ecosystem landscape, wireless communication
has become an interdisciplinary field whose future is shaped by several interacting …

被引用次数：6 相关文章所有 3 个版本

[PDF] jmlr.org

[PDF][PDF] Batch learning from logged bandit feedback through counterfactual risk minimization

A Swaminathan, T Joachims - The Journal of Machine Learning Research, 2015 - jmlr.org

We develop a learning principle and an efficient algorithm for batch learning from logged
bandit feedback. This learning setting is ubiquitous in online systems (eg, ad placement …

被引用次数：506 相关文章所有 10 个版本

[PDF] mlr.press

On last-iterate convergence beyond zero-sum games

I Anagnostides, I Panageas, G Farina… - International …, 2022 - proceedings.mlr.press

Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …

被引用次数：40 相关文章所有 8 个版本

[PDF] acm.org

Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games

I Anagnostides, C Daskalakis, G Farina… - Proceedings of the 54th …, 2022 - dl.acm.org

Recently, Daskalakis, Fishelson, and Golowich (DFG)(NeurIPS '21) showed that if all agents
in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights …

被引用次数：59 相关文章所有 7 个版本

[PDF] arxiv.org

Bandits with concave rewards and convex knapsacks

S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org

In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …

被引用次数：225 相关文章所有 6 个版本

[PDF] siam.org

Fast algorithms for online stochastic convex programming

S Agrawal, NR Devanur - Proceedings of the twenty-sixth annual ACM-SIAM …, 2014 - SIAM

We introduce the online stochastic Convex Programming (CP) problem, a very general
version of stochastic online problems which allows arbitrary concave objectives and convex …

被引用次数：193 相关文章所有 8 个版本

[PDF] neurips.cc

Reinforcement learning with convex constraints

S Miryoosefi, K Brantley, H Daume III… - Advances in neural …, 2019 - proceedings.neurips.cc

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall
reward. However, many key aspects of a desired behavior are more naturally expressed as …

被引用次数：98 相关文章所有 18 个版本

[PDF] aaai.org

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

G Farina, C Kroer, T Sandholm - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …

被引用次数：62 相关文章所有 8 个版本

[PDF] mlr.press

A simple reward-free approach to constrained reinforcement learning

S Miryoosefi, C Jin - International Conference on Machine …, 2022 - proceedings.mlr.press

In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the
overall reward but also satisfy the additional safety, diversity, or budget constraints …

被引用次数：32 相关文章所有 8 个版本