关注
Lex Weaver
标题
引用次数
引用次数
年份
Infinite-horizon gradient-based policy search: Ii. gradient ascent algorithms and experiments
J Baxter, PL Bartlett, L Weaver
Journal of Artificial Intelligence Research 15, 351-381, 2001
1260*2001
Experiments with infinite-horizon, policy-gradient estimation
J Baxter, PL Bartlett, L Weaver
Journal of Artificial Intelligence Research, 351-381, 2001
1260*2001
The optimal reward baseline for gradient-based reinforcement learning
L Weaver, N Tao
Proceedings of the Seventeenth conference on Uncertainty in artificial …, 2001
324*2001
Learning to play chess using temporal differences
J Baxter, A Tridgell, L Weaver
Machine Learning 40 (3), 243-263, 2000
2012000
Knightcap: a chess program that learns by combining td (lambda) with game-tree search
J Baxter, A Tridgell, L Weaver
arXiv preprint cs/9901002, 1999
1891999
KnightCap: A chess program that learns by combining TD () with game-tree search
J Baxter, A Tridgell, L Weaver
189*
A Multi-Agent Policy-Gradient Approach to Network Routing.
N Tao, J Baxter, L Weaver
ICML 1, 553-560, 2001
992001
Experiments in parameter learning using temporal differences
J Baxter, A Tridgell, L Weaver
International Computer Chess Association Journal 21 (2), 84-99, 1998
671998
TDLeaf (lambda): Combining temporal difference learning with game-tree search
J Baxter, A Tridgell, L Weaver
arXiv preprint cs/9901001, 1999
601999
TDLeaf (lambda): Combining temporal difference learning with game-tree search
J Baxter, A Tridgell, L Weaver
arXiv preprint cs/9901001, 1999
601999
Direct gradient-based reinforcement learning: II. Gradient ascent algorithms and experiments
J Baxter, L Weaver, P Bartlett
National University, 1999
441999
Reinforcement learning and chess
J Baxter, A Tridgell, L Weaver
Machines that learn to play games, 91-116, 2001
272001
Reinforcement learning from state and temporal differences
L Weaver, J Baxter
Technical report, Department of Computer Science, Australian National University, 1999
171999
Evolution of neural networks to play the game of dots-and-boxes
L Weaver, T Bossomaier
arXiv preprint cs/9809111, 1998
161998
KnightCap: A chess program that learns by combining TD () with minimax search
J Baxter, A Tridgell, L Weaver
Technical report, Australian National University, Canberra, 1997
101997
The variance minimizing constant reward baseline for gradient-based reinforcement learning
L Weaver, N Tao
Australian National University, Department of Computer Science, 2001
72001
Sorting Integers on the AP1000
L Weaver, A Lynes
arXiv preprint cs/0004013, 2000
22000
Learning From State Differences: STD ()
L Weaver, J Baxter
Technical report, Department of Computer Science, Australian National University, 1999
21999
Pre-fetching tree-structured data in distributed memory
L Weaver, C Johnson
arXiv preprint cs/9810002, 1998
21998
STD (λ): learning state differences with TD (λ)
L Weaver, J Baxter
University of New South Wales, 2001
12001
系统目前无法执行此操作,请稍后再试。
文章 1–20