[PDF][PDF] Gaussian processes for fast policy optimisation of pomdp-based dialogue managers
Proceedings of the SIGDIAL 2010 Conference, 2010•aclanthology.org
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a
dialogue policy robust to speech understanding errors to be learnt. However, a major
challenge in POMDP policy learning is to maintain tractability, so the use of approximation is
inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal
POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain
an estimate of the uncertainty of the approximation. We first demonstrate the idea on a …
dialogue policy robust to speech understanding errors to be learnt. However, a major
challenge in POMDP policy learning is to maintain tractability, so the use of approximation is
inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal
POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain
an estimate of the uncertainty of the approximation. We first demonstrate the idea on a …
Abstract
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果