On reinforcement learning using Monte Carlo tree search with supervised learning: Non-asymptotic...

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

被引用次数：1638 相关文章所有 8 个版本

[PDF] aaai.org

Generalize a small pre-trained model to arbitrarily large tsp instances

ZH Fu, KB Qiu, H Zha - Proceedings of the AAAI conference on artificial …, 2021 - ojs.aaai.org

For the traveling salesman problem (TSP), the existing supervised learning based
algorithms suffer seriously from the lack of generalization ability. To overcome this …

被引用次数：186 相关文章所有 6 个版本

[PDF] researchgate.net

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

L Zhang, Y Chen, W Wang, Z Han, S Li, Z Pan… - Frontiers of Computer …, 2021 - Springer

Solving the optimization problem to approach a Nash Equilibrium point plays an important
role in imperfect information games, eg, StarCraft and poker. Neural Fictitious Self-Play …

被引用次数：20 相关文章所有 6 个版本

[PDF] neurips.cc

Poly-hoot: Monte-carlo planning in continuous space mdps with non-asymptotic analysis

W Mao, K Zhang, Q Xie, T Basar - Advances in Neural …, 2020 - proceedings.neurips.cc

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has
demonstrated remarkable performance in applications with finite spaces. In this paper, we …

被引用次数：24 相关文章所有 8 个版本

[PDF] acm.org

On reinforcement learning for turn-based zero-sum Markov games

D Shah, V Somani, Q Xie, Z Xu - Proceedings of the 2020 ACM-IMS on …, 2020 - dl.acm.org

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum
games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Real-time tree search with pessimistic scenarios

T Osogami, T Takahashi - arXiv preprint arXiv:1902.10870, 2019 - arxiv.org

Autonomous agents need to make decisions in a sequential manner, under partially
observable environment, and in consideration of how other agents behave. In critical …

被引用次数：12 相关文章所有 4 个版本

[PDF] mlr.press

Scale-free adaptive planning for deterministic dynamics & discounted rewards

P Bartlett, V Gabillon, J Healey… - … on Machine Learning, 2019 - proceedings.mlr.press

We address the problem of planning in an environment with deterministic dynamics and
stochastic discounted rewards under a limited numerical budget where the ranges of both …

被引用次数：7 相关文章所有 6 个版本

[PDF] mlr.press

Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition

T Osogami, T Takahashi - Asian Conference on Machine …, 2019 - proceedings.mlr.press

Autonomous agents need to make decisions in a sequential manner, under partially
observable environment, and in consideration of how other agents behave. In critical …

被引用次数：4 相关文章所有 3 个版本

[PDF] academia.edu

[PDF][PDF] P❖▲❨✲❍❖❖❚: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

W Mao, K Zhang, Q Xie, T Basar - academia.edu

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has
demonstrated remarkable performance in applications with finite spaces. In this paper, we …

[PDF] mit.edu

On exploiting structures for deep learning algorithms with matrix estimation

Y Yang - 2020 - dspace.mit.edu

Despite recent breakthroughs of deep learning, the intrinsic structures within tasks have not
yet been fully explored and exploited for better performance. This thesis proposes to …