From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
[HTML][HTML] Intelligent flood forecasting and warning: A survey
Accurately predicting the magnitude and timing of floods is an extremely challenging
problem for watershed management, as it aims to provide early warning and save lives …
problem for watershed management, as it aims to provide early warning and save lives …
Knowing what to learn: a metric-oriented focal mechanism for image captioning
Despite considerable progress, image captioning still suffers from the huge difference in
quality between easy and hard examples, which is left unexploited in existing methods. To …
quality between easy and hard examples, which is left unexploited in existing methods. To …
Visual clues: Bridging vision and language foundations for image paragraph captioning
People say," A picture is worth a thousand words". Then how can we get the rich information
out of the image? We argue that by using visual clues to bridge large pretrained vision …
out of the image? We argue that by using visual clues to bridge large pretrained vision …
Towards diverse paragraph captioning for untrimmed videos
Video paragraph captioning aims to describe multiple events in untrimmed videos with
descriptive paragraphs. Existing approaches mainly solve the problem in two steps: event …
descriptive paragraphs. Existing approaches mainly solve the problem in two steps: event …
Proactive privacy-preserving learning for cross-modal retrieval
Deep cross-modal retrieval techniques have recently achieved remarkable performance,
which also poses severe threats to data privacy potentially. Nowadays, enormous user …
which also poses severe threats to data privacy potentially. Nowadays, enormous user …
Adversarial bipartite graph learning for video domain adaptation
Domain adaptation techniques, which focus on adapting models between distributionally
different domains, are rarely explored in the video recognition area due to the significant …
different domains, are rarely explored in the video recognition area due to the significant …
Strong: Spatio-temporal reinforcement learning for cross-modal video moment localization
In this article, we tackle the cross-modal video moment localization issue, namely, localizing
the most relevant video moment in an untrimmed video given a sentence as the query. The …
the most relevant video moment in an untrimmed video given a sentence as the query. The …
Mitigating generation shifts for generalized zero-shot learning
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information to
recognize seen and unseen samples, where unseen classes are not observable during …
recognize seen and unseen samples, where unseen classes are not observable during …
Effective multimodal encoding for image paragraph captioning
TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …