作者
Noah Arthurs, Sawyer Birnbaum
简介
We created a machine learning algorithm that, given the rules of a two-player adversarial game, outputs a function Q such that Q (s, a) approximates the expected utility of action a in state s. Our Q-function is a neural network trained using reinforcement learning across data generated by selfplay. We then create an agent for the game using the policy determined by the learned Q-function. Our algorithm is capable of creating very strong agents for small to medium sized games. It is scalable in that larger network sizes correspond to better play on a given game, and it easily generalizes across the games we have tested. We have made first steps toward determining the relationship between the complexity of a game and the complexity of the model required to play that game at a certain level. We have also found promising results using aggregate features to reduce input size and using weight sharing to jumpstart training.