Actor-critic sequence generation for relative difference captioning
Z Fei - Proceedings of the 2020 International Conference on …, 2020 - dl.acm.org
Proceedings of the 2020 International Conference on Multimedia Retrieval, 2020•dl.acm.org
This paper investigates a new task named relative difference caption which aims to generate
a sentence to tell the difference between the given image pair. Difference description is a
crucial task for developing intelligent machines that can understand and handle changeable
visual scenes and applications. Towards that end, we propose a reinforcement learning-
based model, which utilizes a policy network and a value network in a decision procedure to
collaboratively produce a difference caption. Specifically, the policy network works as an …
a sentence to tell the difference between the given image pair. Difference description is a
crucial task for developing intelligent machines that can understand and handle changeable
visual scenes and applications. Towards that end, we propose a reinforcement learning-
based model, which utilizes a policy network and a value network in a decision procedure to
collaboratively produce a difference caption. Specifically, the policy network works as an …
This paper investigates a new task named relative difference caption which aims to generate a sentence to tell the difference between the given image pair. Difference description is a crucial task for developing intelligent machines that can understand and handle changeable visual scenes and applications. Towards that end, we propose a reinforcement learning-based model, which utilizes a policy network and a value network in a decision procedure to collaboratively produce a difference caption. Specifically, the policy network works as an actor to estimate the probability of next word based on the current state and the value network serves as a critic to predict all possible extension values according to current action and state. To encourage generating correct and meaningful descriptions, we leverage a visual-linguistic similarity-based reward function as feedback. Empirical results on the recently released dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果