Grit: Faster and better image captioning transformer using dual visual features

VQ Nguyen, M Suganuma, T Okatani - European Conference on Computer …, 2022 - Springer
Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …

[PDF][PDF] S2 Transformer for Image Captioning.

P Zeng, H Zhang, J Song, L Gao - IJCAI, 2022 - ijcai.org
Transformer-based architectures with grid features represent the state-of-the-art in visual
and language reasoning tasks, such as visual question answering and image-text matching …

A thorough review of models, evaluation metrics, and datasets on image captioning

G Luo, L Cheng, C Jing, C Zhao… - IET Image Processing, 2022 - Wiley Online Library
Image captioning means generate descriptive sentences from a query image automatically.
It has recently received widespread attention from the computer vision and natural language …

Memory-based augmentation network for video captioning

S Jing, H Zhang, P Zeng, L Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …

[PDF][PDF] ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.

J Li, Z Mao, S Fang, H Li - IJCAI, 2022 - scholar.archive.org
Image captioning (IC), bringing vision to language, has drawn extensive attention. Precisely
describing visual relations between image objects is a key challenge in IC. We argue that …

Aligned visual semantic scene graph for image captioning

S Zhao, L Li, H Peng - Displays, 2022 - Elsevier
Image captioning is a multi-modal task to describe an image into natural language. Many
state-of-the-art methods generally take the encoder–decoder architecture, encode an image …

Image Captioning With Controllable and Adaptive Length Levels

N Ding, C Deng, M Tan, Q Du, Z Ge… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Image captioning is a core challenge in computer vision, attracting significant attention.
Traditional methods prioritize caption quality, often overlooking style control. Our research …

Position-guided transformer for image captioning

J Hu, Y Yang, L Yao, Y An, L Pan - Image and Vision Computing, 2022 - Elsevier
Transformer-based frameworks have shown superiorities in image captioning. However,
such frameworks are strenuous to consider geometric interrelations among visual contents …

SPT: Spatial pyramid transformer for image captioning

H Zhang, P Zeng, L Gao, X Lyu, J Song… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The existing approaches to image captioning tend to adopt Transformer-based architectures
with grid features, which represent the state-of-the-art. However, the strategies are prone to …

Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning

J Li, Z Mao, H Li, W Chen, Y Zhang - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial
aspect of IC is the accurate depiction of visual relations among image objects. Visual …