所有版本 - 学术资源搜索

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

被引用次数：432 相关文章

[PDF] columbia.edu

[PDF][PDF] A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

J Bhandari, D Russo, R Singal - arXiv preprint arXiv:1806.02450, 2018 - columbia.edu

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

J Bhandari, D Russo, R Singal - arXiv preprint arXiv:1806.02450, 2018 - arxiv.org

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

J Bhandari, D Russo, R Singal - Operations Research, 2021 - pubsonline.informs.org

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

J Bhandari, D Russo, R Singal - Conference On Learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

J Bhandari, D Russo, R Singal - Operations Research, 2021 - econpapers.repec.org

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

J Bhandari, D Russo, R Singal - Operations Research, 2021 - dl.acm.org

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

[引用][C] A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

J Bhandari, D Russo, R Singal - Operations research, 2021 - dialnet.unirioja.es

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

J Bhandari, D Russo, R Singal - Operations Research, 2021 - ideas.repec.org

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

J Bhandari, D Russo, R Singal - arXiv e-prints, 2018 - ui.adsabs.harvard.edu

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …