Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

A modern introduction to online learning

F Orabona - arXiv preprint arXiv:1912.13213, 2019 - arxiv.org
In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

[图书][B] Partially observed Markov decision processes

V Krishnamurthy - 2016 - books.google.com
Covering formulation, algorithms, and structural results, and linking theory to real-world
applications in controlled sensing (including social learning, adaptive radars and sequential …

Boosting: Foundations and algorithms

RE Schapire, Y Freund - Kybernetes, 2013 - emerald.com
The term “boosting” denotes a powerful means of facilitating machine learning that was
invented by the book's authors 20 years ago and intensively developed since. Despite this …

Potential games

D Monderer, LS Shapley - Games and economic behavior, 1996 - Elsevier
Potential Games Page 1 GAMES AND ECONOMIC BEHAVIOR 14, 124–143 (1996)
ARTICLE NO. 0044 Potential Games Dov Monderer ∗ Faculty of Industrial Engineering and …

[图书][B] Prediction, learning, and games

N Cesa-Bianchi, G Lugosi - 2006 - books.google.com
This important text and reference for researchers and students in machine learning, game
theory, statistics and information theory offers a comprehensive treatment of the problem of …

The nonstochastic multiarmed bandit problem

P Auer, N Cesa-Bianchi, Y Freund, RE Schapire - SIAM journal on computing, 2002 - SIAM
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot
machines to play in a sequence of trials so as to maximize his reward. This classical …

[PDF][PDF] Online convex programming and generalized infinitesimal gradient ascent

M Zinkevich - Proceedings of the 20th international conference on …, 2003 - cdn.aaai.org
Convex programming involves a convex set F⊆ Rn and a convex cost function c: F→ R. The
goal of convex programming is to find a point in F which minimizes c. In online convex …

[图书][B] Robustness

LP Hansen, TJ Sargent - 2008 - degruyter.com
The standard theory of decision making under uncertainty advises the decision maker to
form a statistical model linking outcomes to decisions and then to choose the optimal …