Curiosity-driven reinforcement learning for diverse visual paragraph generation

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：384 相关文章所有 11 个版本

[HTML] oaepublish.com

[HTML][HTML] Intelligent flood forecasting and warning: A survey

Y Zhang, D Pan, J Van Griensven, SX Yang… - Intelligence & …, 2023 - oaepublish.com

Accurately predicting the magnitude and timing of floods is an extremely challenging
problem for watershed management, as it aims to provide early warning and save lives …

被引用次数：6 相关文章

[PDF] google.com

Knowing what to learn: a metric-oriented focal mechanism for image captioning

J Ji, Y Ma, X Sun, Y Zhou, Y Wu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Despite considerable progress, image captioning still suffers from the huge difference in
quality between easy and hard examples, which is left unexploited in existing methods. To …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

Visual clues: Bridging vision and language foundations for image paragraph captioning

Y Xie, L Zhou, X Dai, L Yuan, N Bach… - Advances in Neural …, 2022 - proceedings.neurips.cc

People say," A picture is worth a thousand words". Then how can we get the rich information
out of the image? We argue that by using visual clues to bridge large pretrained vision …

被引用次数：27 相关文章所有 7 个版本

[PDF] thecvf.com

Towards diverse paragraph captioning for untrimmed videos

Y Song, S Chen, Q Jin - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com

Video paragraph captioning aims to describe multiple events in untrimmed videos with
descriptive paragraphs. Existing approaches mainly solve the problem in two steps: event …

被引用次数：45 相关文章所有 6 个版本

Proactive privacy-preserving learning for cross-modal retrieval

PF Zhang, G Bai, H Yin, Z Huang - ACM Transactions on Information …, 2023 - dl.acm.org

Deep cross-modal retrieval techniques have recently achieved remarkable performance,
which also poses severe threats to data privacy potentially. Nowadays, enormous user …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Adversarial bipartite graph learning for video domain adaptation

Y Luo, Z Huang, Z Wang, Z Zhang… - Proceedings of the 28th …, 2020 - dl.acm.org

Domain adaptation techniques, which focus on adapting models between distributionally
different domains, are rarely explored in the video recognition area due to the significant …

被引用次数：53 相关文章所有 4 个版本

[PDF] ustc.edu.cn

Strong: Spatio-temporal reinforcement learning for cross-modal video moment localization

D Cao, Y Zeng, M Liu, X He, M Wang… - Proceedings of the 28th …, 2020 - dl.acm.org

In this article, we tackle the cross-modal video moment localization issue, namely, localizing
the most relevant video moment in an untrimmed video given a sentence as the query. The …

被引用次数：44 相关文章所有 3 个版本

[PDF] arxiv.org

Mitigating generation shifts for generalized zero-shot learning

Z Chen, Y Luo, S Wang, R Qiu, J Li… - Proceedings of the 29th …, 2021 - dl.acm.org

Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information to
recognize seen and unseen samples, where unseen classes are not observable during …

被引用次数：29 相关文章所有 3 个版本

Effective multimodal encoding for image paragraph captioning

TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org

In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …

被引用次数：11 相关文章所有 5 个版本