Semantic attribute enriched storytelling from a sequence of images
ZM Malakan, GM Hassan… - 2021 Digital Image …, 2021 - ieeexplore.ieee.org
2021 Digital Image Computing: Techniques and Applications (DICTA), 2021•ieeexplore.ieee.org
Visual storytelling (VST) pertains to the task of generating story-based sentences from an
ordered sequence of images. Contemporary techniques suffer from several limitations such
as inadequate encapsulation of visual variance and context capturing among the input
sequence. Consequently, generated story from such techniques often lacks coherence,
context and semantic information. In this research, we devise a 'Semantic Attribute Enriched
Storytelling'(SAES) framework to mitigate these issues. To that end, we first extract the visual …
ordered sequence of images. Contemporary techniques suffer from several limitations such
as inadequate encapsulation of visual variance and context capturing among the input
sequence. Consequently, generated story from such techniques often lacks coherence,
context and semantic information. In this research, we devise a 'Semantic Attribute Enriched
Storytelling'(SAES) framework to mitigate these issues. To that end, we first extract the visual …
Visual storytelling (VST) pertains to the task of generating story-based sentences from an ordered sequence of images. Contemporary techniques suffer from several limitations such as inadequate encapsulation of visual variance and context capturing among the input sequence. Consequently, generated story from such techniques often lacks coherence, context and semantic information. In this research, we devise a ‘Semantic Attribute Enriched Storytelling’ (SAES) framework to mitigate these issues. To that end, we first extract the visual features of input image sequence and the noun entities present in the visual input by employing an off-the-shelf object detector. The two features are concatenated to encapsulate the visual variance of the input sequence. The features are then passed through a Bidirectional-LSTM sequence encoder to capture the past and future context of the input image sequence followed by attention mechanism to enhance the discriminality of the input to language model i.e., mogrifier-LSTM. Additionally, we incorporate semantic attributes e.g., nouns to complement the semantic context in the generated story. Detailed experimental and human evaluations are performed to establish competitive performance of proposed technique. We achieve up 1.4% improvement on BLEU metric over the recent state-of-art methods.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果