Introduction to online convex optimization
E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
From cyber–physical convergence to digital twins: A review on edge computing use case designs
MC Hlophe, BT Maharaj - Applied Sciences, 2023 - mdpi.com
As a result of the new telecommunication ecosystem landscape, wireless communication
has become an interdisciplinary field whose future is shaped by several interacting …
has become an interdisciplinary field whose future is shaped by several interacting …
[PDF][PDF] Batch learning from logged bandit feedback through counterfactual risk minimization
A Swaminathan, T Joachims - The Journal of Machine Learning Research, 2015 - jmlr.org
We develop a learning principle and an efficient algorithm for batch learning from logged
bandit feedback. This learning setting is ubiquitous in online systems (eg, ad placement …
bandit feedback. This learning setting is ubiquitous in online systems (eg, ad placement …
On last-iterate convergence beyond zero-sum games
Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …
player zero-sum games, and only apply under rigid assumptions about what dynamics the …
Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games
Recently, Daskalakis, Fishelson, and Golowich (DFG)(NeurIPS '21) showed that if all agents
in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights …
in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights …
Bandits with concave rewards and convex knapsacks
S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
Fast algorithms for online stochastic convex programming
S Agrawal, NR Devanur - Proceedings of the twenty-sixth annual ACM-SIAM …, 2014 - SIAM
We introduce the online stochastic Convex Programming (CP) problem, a very general
version of stochastic online problems which allows arbitrary concave objectives and convex …
version of stochastic online problems which allows arbitrary concave objectives and convex …
Reinforcement learning with convex constraints
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall
reward. However, many key aspects of a desired behavior are more naturally expressed as …
reward. However, many key aspects of a desired behavior are more naturally expressed as …
Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent
Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …
A simple reward-free approach to constrained reinforcement learning
S Miryoosefi, C Jin - International Conference on Machine …, 2022 - proceedings.mlr.press
In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the
overall reward but also satisfy the additional safety, diversity, or budget constraints …
overall reward but also satisfy the additional safety, diversity, or budget constraints …