An introduction to deep reinforcement learning

V François-Lavet, P Henderson, R Islam… - … and Trends® in …, 2018 - nowpublishers.com
Deep reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex …

Mastering the game of go without human knowledge

D Silver, J Schrittwieser, K Simonyan, I Antonoglou… - nature, 2017 - nature.com
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa,
superhuman proficiency in challenging domains. Recently, AlphaGo became the first …

A survey of monte carlo tree search methods

CB Browne, E Powley, D Whitehouse… - … Intelligence and AI …, 2012 - ieeexplore.ieee.org
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the
precision of tree search with the generality of random sampling. It has received considerable …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Pachi: State of the art open source Go program

P Baudiš, J Gailly - Advances in computer games, 2011 - Springer
We present a state of the art implementation of the Monte Carlo Tree Search algorithm for
the game of Go. Our Pachi software is currently one of the strongest open source Go …

Mastering the game of Go with deep neural networks and tree search

D Silver, A Huang, CJ Maddison, A Guez, L Sifre… - nature, 2016 - nature.com
The game of Go has long been viewed as the most challenging of classic games for artificial
intelligence owing to its enormous search space and the difficulty of evaluating board …

Exploration–exploitation tradeoff using variance estimates in multi-armed bandits

JY Audibert, R Munos, C Szepesvári - Theoretical Computer Science, 2009 - Elsevier
Algorithms based on upper confidence bounds for balancing exploration and exploitation
are gaining popularity since they are easy to implement, efficient and effective. This paper …

Monte-Carlo tree search and rapid action value estimation in computer Go

S Gelly, D Silver - Artificial Intelligence, 2011 - Elsevier
A new paradigm for search, based on Monte-Carlo simulation, has revolutionised the
performance of computer Go programs. In this article we describe two extensions to the …

[PDF][PDF] X-Armed Bandits.

S Bubeck, R Munos, G Stoltz, C Szepesvári - Journal of Machine Learning …, 2011 - jmlr.org
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …

Pure exploration in multi-armed bandits problems

S Bubeck, R Munos, G Stoltz - … , ALT 2009, Porto, Portugal, October 3-5 …, 2009 - Springer
We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of strategies that perform an online exploration of the arms. The …