Infinite-horizon gradient-based policy search: Ii. gradient ascent algorithms and experiments J Baxter, PL Bartlett, L Weaver Journal of Artificial Intelligence Research 15, 351-381, 2001 | 1260* | 2001 |
Experiments with infinite-horizon, policy-gradient estimation J Baxter, PL Bartlett, L Weaver Journal of Artificial Intelligence Research, 351-381, 2001 | 1260* | 2001 |
The optimal reward baseline for gradient-based reinforcement learning L Weaver, N Tao Proceedings of the Seventeenth conference on Uncertainty in artificial …, 2001 | 324* | 2001 |
Learning to play chess using temporal differences J Baxter, A Tridgell, L Weaver Machine Learning 40 (3), 243-263, 2000 | 201 | 2000 |
Knightcap: a chess program that learns by combining td (lambda) with game-tree search J Baxter, A Tridgell, L Weaver arXiv preprint cs/9901002, 1999 | 189 | 1999 |
KnightCap: A chess program that learns by combining TD () with game-tree search J Baxter, A Tridgell, L Weaver | 189* | |
A Multi-Agent Policy-Gradient Approach to Network Routing. N Tao, J Baxter, L Weaver ICML 1, 553-560, 2001 | 99 | 2001 |
Experiments in parameter learning using temporal differences J Baxter, A Tridgell, L Weaver International Computer Chess Association Journal 21 (2), 84-99, 1998 | 67 | 1998 |
TDLeaf (lambda): Combining temporal difference learning with game-tree search J Baxter, A Tridgell, L Weaver arXiv preprint cs/9901001, 1999 | 60 | 1999 |
TDLeaf (lambda): Combining temporal difference learning with game-tree search J Baxter, A Tridgell, L Weaver arXiv preprint cs/9901001, 1999 | 60 | 1999 |
Direct gradient-based reinforcement learning: II. Gradient ascent algorithms and experiments J Baxter, L Weaver, P Bartlett National University, 1999 | 44 | 1999 |
Reinforcement learning and chess J Baxter, A Tridgell, L Weaver Machines that learn to play games, 91-116, 2001 | 27 | 2001 |
Reinforcement learning from state and temporal differences L Weaver, J Baxter Technical report, Department of Computer Science, Australian National University, 1999 | 17 | 1999 |
Evolution of neural networks to play the game of dots-and-boxes L Weaver, T Bossomaier arXiv preprint cs/9809111, 1998 | 16 | 1998 |
KnightCap: A chess program that learns by combining TD () with minimax search J Baxter, A Tridgell, L Weaver Technical report, Australian National University, Canberra, 1997 | 10 | 1997 |
The variance minimizing constant reward baseline for gradient-based reinforcement learning L Weaver, N Tao Australian National University, Department of Computer Science, 2001 | 7 | 2001 |
Sorting Integers on the AP1000 L Weaver, A Lynes arXiv preprint cs/0004013, 2000 | 2 | 2000 |
Learning From State Differences: STD () L Weaver, J Baxter Technical report, Department of Computer Science, Australian National University, 1999 | 2 | 1999 |
Pre-fetching tree-structured data in distributed memory L Weaver, C Johnson arXiv preprint cs/9810002, 1998 | 2 | 1998 |
STD (λ): learning state differences with TD (λ) L Weaver, J Baxter University of New South Wales, 2001 | 1 | 2001 |