Tcic: Theme concepts learning cross language and vision for image captioning

VQ Nguyen, M Suganuma, T Okatani - European Conference on Computer …, 2022 - Springer

Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …

被引用次数：120 相关文章所有 8 个版本

[PDF] ijcai.org

[PDF][PDF] S2 Transformer for Image Captioning.

P Zeng, H Zhang, J Song, L Gao - IJCAI, 2022 - ijcai.org

Transformer-based architectures with grid features represent the state-of-the-art in visual
and language reasoning tasks, such as visual question answering and image-text matching …

被引用次数：55 相关文章

[PDF] wiley.com Full View

A thorough review of models, evaluation metrics, and datasets on image captioning

G Luo, L Cheng, C Jing, C Zhao… - IET Image Processing, 2022 - Wiley Online Library

Image captioning means generate descriptive sentences from a query image automatically.
It has recently received widespread attention from the computer vision and natural language …

被引用次数：23 相关文章所有 4 个版本

Memory-based augmentation network for video captioning

S Jing, H Zhang, P Zeng, L Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …

被引用次数：25 相关文章所有 2 个版本

[PDF] archive.org

[PDF][PDF] ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.

J Li, Z Mao, S Fang, H Li - IJCAI, 2022 - scholar.archive.org

Image captioning (IC), bringing vision to language, has drawn extensive attention. Precisely
describing visual relations between image objects is a key challenge in IC. We argue that …

被引用次数：16 相关文章

Aligned visual semantic scene graph for image captioning

S Zhao, L Li, H Peng - Displays, 2022 - Elsevier

Image captioning is a multi-modal task to describe an image into natural language. Many
state-of-the-art methods generally take the encoder–decoder architecture, encode an image …

被引用次数：14 相关文章

Image Captioning With Controllable and Adaptive Length Levels

N Ding, C Deng, M Tan, Q Du, Z Ge… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Image captioning is a core challenge in computer vision, attracting significant attention.
Traditional methods prioritize caption quality, often overlooking style control. Our research …

被引用次数：7 相关文章所有 6 个版本

Position-guided transformer for image captioning

J Hu, Y Yang, L Yao, Y An, L Pan - Image and Vision Computing, 2022 - Elsevier

Transformer-based frameworks have shown superiorities in image captioning. However,
such frameworks are strenuous to consider geometric interrelations among visual contents …

被引用次数：6 相关文章所有 2 个版本

SPT: Spatial pyramid transformer for image captioning

H Zhang, P Zeng, L Gao, X Lyu, J Song… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The existing approaches to image captioning tend to adopt Transformer-based architectures
with grid features, which represent the state-of-the-art. However, the strategies are prone to …

被引用次数：10 相关文章

Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning

J Li, Z Mao, H Li, W Chen, Y Zhang - ACM Transactions on Multimedia …, 2024 - dl.acm.org

Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial
aspect of IC is the accurate depiction of visual relations among image objects. Visual …

被引用次数：5 相关文章