Multi-agent reinforcement learning: A selective overview of theories and algorithms
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …
has registered tremendous success in solving various sequential decision-making problems …
Generalize a small pre-trained model to arbitrarily large tsp instances
For the traveling salesman problem (TSP), the existing supervised learning based
algorithms suffer seriously from the lack of generalization ability. To overcome this …
algorithms suffer seriously from the lack of generalization ability. To overcome this …
A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games
L Zhang, Y Chen, W Wang, Z Han, S Li, Z Pan… - Frontiers of Computer …, 2021 - Springer
Solving the optimization problem to approach a Nash Equilibrium point plays an important
role in imperfect information games, eg, StarCraft and poker. Neural Fictitious Self-Play …
role in imperfect information games, eg, StarCraft and poker. Neural Fictitious Self-Play …
Poly-hoot: Monte-carlo planning in continuous space mdps with non-asymptotic analysis
Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has
demonstrated remarkable performance in applications with finite spaces. In this paper, we …
demonstrated remarkable performance in applications with finite spaces. In this paper, we …
On reinforcement learning for turn-based zero-sum Markov games
We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum
games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement …
games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement …
Real-time tree search with pessimistic scenarios
T Osogami, T Takahashi - arXiv preprint arXiv:1902.10870, 2019 - arxiv.org
Autonomous agents need to make decisions in a sequential manner, under partially
observable environment, and in consideration of how other agents behave. In critical …
observable environment, and in consideration of how other agents behave. In critical …
Scale-free adaptive planning for deterministic dynamics & discounted rewards
P Bartlett, V Gabillon, J Healey… - … on Machine Learning, 2019 - proceedings.mlr.press
We address the problem of planning in an environment with deterministic dynamics and
stochastic discounted rewards under a limited numerical budget where the ranges of both …
stochastic discounted rewards under a limited numerical budget where the ranges of both …
Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition
T Osogami, T Takahashi - Asian Conference on Machine …, 2019 - proceedings.mlr.press
Autonomous agents need to make decisions in a sequential manner, under partially
observable environment, and in consideration of how other agents behave. In critical …
observable environment, and in consideration of how other agents behave. In critical …
[PDF][PDF] P❖▲❨✲❍❖❖❚: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis
W Mao, K Zhang, Q Xie, T Basar - academia.edu
Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has
demonstrated remarkable performance in applications with finite spaces. In this paper, we …
demonstrated remarkable performance in applications with finite spaces. In this paper, we …
On exploiting structures for deep learning algorithms with matrix estimation
Y Yang - 2020 - dspace.mit.edu
Despite recent breakthroughs of deep learning, the intrinsic structures within tasks have not
yet been fully explored and exploited for better performance. This thesis proposes to …
yet been fully explored and exploited for better performance. This thesis proposes to …