Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management

PH Su, P Budzianowski, S Ultes, M Gasic… - arXiv preprint arXiv …, 2017 - arxiv.org
Deep reinforcement learning (RL) methods have significant potential for dialogue policy
optimisation. However, they suffer from a poor performance in the early stages of learning …

Sample efficient deep reinforcement learning for dialogue systems with large action spaces

G Weisz, P Budzianowski, PH Su… - IEEE/ACM Transactions …, 2018 - ieeexplore.ieee.org
In spoken dialogue systems, we aim to deploy artificial intelligence to build automated
dialogue agents that can converse with humans. A part of this effort is the policy optimization …

[PDF][PDF] Flow: Deep reinforcement learning for control in sumo

N Kheterpal, K Parvate, C Wu, A Kreidieh… - EPiC Series in …, 2018 - kanaad.me
We detail the motivation and design decisions underpinning Flow, a computational
framework integrating SUMO with the deep reinforcement learning libraries rllab and RLlib …

Generalized off-policy actor-critic

S Zhang, W Boehmer… - Advances in neural …, 2019 - proceedings.neurips.cc
We propose a new objective, the counterfactual objective, unifying existing objectives for off-
policy policy gradient algorithms in the continuing reinforcement learning (RL) setting …

Dynamic planning in open-ended dialogue using reinforcement learning

D Cohen, M Ryu, Y Chow, O Keller, I Greenberg… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite recent advances in natural language understanding and generation, and decades
of research on the development of conversational bots, building automated agents that can …

A mixture-of-expert approach to rl-based dialogue management

Y Chow, A Tulepbergenov, O Nachum, MK Ryu… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite recent advancements in language models (LMs), their application to dialogue
management (DM) problems and ability to carry on rich conversations remain a challenge …

Indoor scene change captioning based on multimodality data

Y Qiu, Y Satoh, R Suzuki, K Iwata, H Kataoka - Sensors, 2020 - mdpi.com
This study proposes a framework for describing a scene change using natural language text
based on indoor scene observations conducted before and after a scene change. The …

Reduced robust random cut forest for out-of-distribution detection in machine learning models

H Vardhan, J Sztipanovits - arXiv preprint arXiv:2206.09247, 2022 - arxiv.org
Most machine learning-based regressors extract information from data collected via past
observations of limited length to make predictions in the future. Consequently, when input to …

[PDF][PDF] Mean actor critic

K Asadi, C Allen, M Roderick, A Mohamed, G Konidaris… - stat, 2017 - cl.uni-heidelberg.de
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state
reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit …

Towards solving text-based games by producing adaptive action spaces

RY Tao, MA Côté, X Yuan, LE Asri - arXiv preprint arXiv:1812.00855, 2018 - arxiv.org
To solve a text-based game, an agent needs to formulate valid text commands for a given
context and find the ones that lead to success. Recent attempts at solving text-based games …