A survey of natural language generation
This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …
Automatic image and video caption generation with deep learning: A concise review and algorithmic overlap
Methodologies that utilize Deep Learning offer great potential for applications that
automatically attempt to generate captions or descriptions about images and video frames …
automatically attempt to generate captions or descriptions about images and video frames …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
Generating diverse and natural 3d human motions from text
Automated generation of 3D human motions from text is a challenging problem. The
generated motions are expected to be sufficiently diverse to explore the text-grounded …
generated motions are expected to be sufficiently diverse to explore the text-grounded …
Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts
Inspired by the strong ties between vision and language, the two intimate human sensing
and communication modalities, our paper aims to explore the generation of 3D human full …
and communication modalities, our paper aims to explore the generation of 3D human full …
End-to-end dense video captioning with parallel decoding
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
Evaluation of text generation: A survey
A Celikyilmaz, E Clark, J Gao - arXiv preprint arXiv:2006.14799, 2020 - arxiv.org
The paper surveys evaluation methods of natural language generation (NLG) systems that
have been developed in the last few years. We group NLG evaluation methods into three …
have been developed in the last few years. We group NLG evaluation methods into three …
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description
J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …
in computer vision and natural language understanding due to the intricate nature of video …
AutoAD: Movie description in context
The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …
Mart: Memory-augmented recurrent transformer for coherent video paragraph captioning
Generating multi-sentence descriptions for videos is one of the most challenging captioning
tasks due to its high requirements for not only visual relevance but also discourse-based …
tasks due to its high requirements for not only visual relevance but also discourse-based …