Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

From cyber–physical convergence to digital twins: A review on edge computing use case designs

MC Hlophe, BT Maharaj - Applied Sciences, 2023 - mdpi.com
As a result of the new telecommunication ecosystem landscape, wireless communication
has become an interdisciplinary field whose future is shaped by several interacting …

[PDF][PDF] Batch learning from logged bandit feedback through counterfactual risk minimization

A Swaminathan, T Joachims - The Journal of Machine Learning Research, 2015 - jmlr.org
We develop a learning principle and an efficient algorithm for batch learning from logged
bandit feedback. This learning setting is ubiquitous in online systems (eg, ad placement …

On last-iterate convergence beyond zero-sum games

I Anagnostides, I Panageas, G Farina… - International …, 2022 - proceedings.mlr.press
Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …

Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games

I Anagnostides, C Daskalakis, G Farina… - Proceedings of the 54th …, 2022 - dl.acm.org
Recently, Daskalakis, Fishelson, and Golowich (DFG)(NeurIPS '21) showed that if all agents
in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights …

Bandits with concave rewards and convex knapsacks

S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …

Fast algorithms for online stochastic convex programming

S Agrawal, NR Devanur - Proceedings of the twenty-sixth annual ACM-SIAM …, 2014 - SIAM
We introduce the online stochastic Convex Programming (CP) problem, a very general
version of stochastic online problems which allows arbitrary concave objectives and convex …

Reinforcement learning with convex constraints

S Miryoosefi, K Brantley, H Daume III… - Advances in neural …, 2019 - proceedings.neurips.cc
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall
reward. However, many key aspects of a desired behavior are more naturally expressed as …

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

G Farina, C Kroer, T Sandholm - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …

A simple reward-free approach to constrained reinforcement learning

S Miryoosefi, C Jin - International Conference on Machine …, 2022 - proceedings.mlr.press
In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the
overall reward but also satisfy the additional safety, diversity, or budget constraints …