Modiﬁcation of UCT with Patterns in Monte-Carlo Go

V François-Lavet, P Henderson, R Islam… - … and Trends® in …, 2018 - nowpublishers.com

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex …

被引用次数：1922 相关文章所有 16 个版本

[PDF] ucl.ac.uk

Mastering the game of go without human knowledge

D Silver, J Schrittwieser, K Simonyan, I Antonoglou… - nature, 2017 - nature.com

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa,
superhuman proficiency in challenging domains. Recently, AlphaGo became the first …

被引用次数：11779 相关文章所有 42 个版本

[PDF] essex.ac.uk

A survey of monte carlo tree search methods

CB Browne, E Powley, D Whitehouse… - … Intelligence and AI …, 2012 - ieeexplore.ieee.org

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the
precision of tree search with the generality of random sampling. It has received considerable …

被引用次数：4056 相关文章所有 55 个版本

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

被引用次数：3264 相关文章所有 26 个版本

[PDF] pasky.or.cz

Pachi: State of the art open source Go program

P Baudiš, J Gailly - Advances in computer games, 2011 - Springer

We present a state of the art implementation of the Monte Carlo Tree Search algorithm for
the game of Go. Our Pachi software is currently one of the strongest open source Go …

被引用次数：134 相关文章所有 8 个版本

[PDF] academia.edu

Mastering the game of Go with deep neural networks and tree search

D Silver, A Huang, CJ Maddison, A Guez, L Sifre… - nature, 2016 - nature.com

The game of Go has long been viewed as the most challenging of classic games for artificial
intelligence owing to its enormous search space and the difficulty of evaluating board …

被引用次数：20773 相关文章所有 95 个版本

[PDF] sciencedirect.com

Exploration–exploitation tradeoff using variance estimates in multi-armed bandits

JY Audibert, R Munos, C Szepesvári - Theoretical Computer Science, 2009 - Elsevier

Algorithms based on upper confidence bounds for balancing exploration and exploitation
are gaining popularity since they are easy to implement, efficient and effective. This paper …

被引用次数：807 相关文章所有 29 个版本

[PDF] sciencedirect.com

Monte-Carlo tree search and rapid action value estimation in computer Go

S Gelly, D Silver - Artificial Intelligence, 2011 - Elsevier

A new paradigm for search, based on Monte-Carlo simulation, has revolutionised the
performance of computer Go programs. In this article we describe two extensions to the …

被引用次数：518 相关文章所有 19 个版本

[PDF] jmlr.org

[PDF][PDF] X-Armed Bandits.

S Bubeck, R Munos, G Stoltz, C Szepesvári - Journal of Machine Learning …, 2011 - jmlr.org

We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …

被引用次数：520 相关文章所有 32 个版本

[PDF] arxiv.org

Pure exploration in multi-armed bandits problems

S Bubeck, R Munos, G Stoltz - … , ALT 2009, Porto, Portugal, October 3-5 …, 2009 - Springer

We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of strategies that perform an online exploration of the arms. The …

被引用次数：642 相关文章所有 31 个版本