From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

[HTML][HTML] Intelligent flood forecasting and warning: A survey

Y Zhang, D Pan, J Van Griensven, SX Yang… - Intelligence & …, 2023 - oaepublish.com
Accurately predicting the magnitude and timing of floods is an extremely challenging
problem for watershed management, as it aims to provide early warning and save lives …

Knowing what to learn: a metric-oriented focal mechanism for image captioning

J Ji, Y Ma, X Sun, Y Zhou, Y Wu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Despite considerable progress, image captioning still suffers from the huge difference in
quality between easy and hard examples, which is left unexploited in existing methods. To …

Visual clues: Bridging vision and language foundations for image paragraph captioning

Y Xie, L Zhou, X Dai, L Yuan, N Bach… - Advances in Neural …, 2022 - proceedings.neurips.cc
People say," A picture is worth a thousand words". Then how can we get the rich information
out of the image? We argue that by using visual clues to bridge large pretrained vision …

Towards diverse paragraph captioning for untrimmed videos

Y Song, S Chen, Q Jin - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Video paragraph captioning aims to describe multiple events in untrimmed videos with
descriptive paragraphs. Existing approaches mainly solve the problem in two steps: event …

Proactive privacy-preserving learning for cross-modal retrieval

PF Zhang, G Bai, H Yin, Z Huang - ACM Transactions on Information …, 2023 - dl.acm.org
Deep cross-modal retrieval techniques have recently achieved remarkable performance,
which also poses severe threats to data privacy potentially. Nowadays, enormous user …

Adversarial bipartite graph learning for video domain adaptation

Y Luo, Z Huang, Z Wang, Z Zhang… - Proceedings of the 28th …, 2020 - dl.acm.org
Domain adaptation techniques, which focus on adapting models between distributionally
different domains, are rarely explored in the video recognition area due to the significant …

Strong: Spatio-temporal reinforcement learning for cross-modal video moment localization

D Cao, Y Zeng, M Liu, X He, M Wang… - Proceedings of the 28th …, 2020 - dl.acm.org
In this article, we tackle the cross-modal video moment localization issue, namely, localizing
the most relevant video moment in an untrimmed video given a sentence as the query. The …

Mitigating generation shifts for generalized zero-shot learning

Z Chen, Y Luo, S Wang, R Qiu, J Li… - Proceedings of the 29th …, 2021 - dl.acm.org
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information to
recognize seen and unseen samples, where unseen classes are not observable during …

Effective multimodal encoding for image paragraph captioning

TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …