Grit: Faster and better image captioning transformer using dual visual features
Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …
provide object-level information that is essential to describe the content of images; they are …
A thorough review of models, evaluation metrics, and datasets on image captioning
G Luo, L Cheng, C Jing, C Zhao… - IET Image Processing, 2022 - Wiley Online Library
Image captioning means generate descriptive sentences from a query image automatically.
It has recently received widespread attention from the computer vision and natural language …
It has recently received widespread attention from the computer vision and natural language …
Memory-based augmentation network for video captioning
Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …
video content. Existing works mainly explore this multimodal learning with the paired source …
[PDF][PDF] ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.
Image captioning (IC), bringing vision to language, has drawn extensive attention. Precisely
describing visual relations between image objects is a key challenge in IC. We argue that …
describing visual relations between image objects is a key challenge in IC. We argue that …
Aligned visual semantic scene graph for image captioning
S Zhao, L Li, H Peng - Displays, 2022 - Elsevier
Image captioning is a multi-modal task to describe an image into natural language. Many
state-of-the-art methods generally take the encoder–decoder architecture, encode an image …
state-of-the-art methods generally take the encoder–decoder architecture, encode an image …
Image Captioning With Controllable and Adaptive Length Levels
Image captioning is a core challenge in computer vision, attracting significant attention.
Traditional methods prioritize caption quality, often overlooking style control. Our research …
Traditional methods prioritize caption quality, often overlooking style control. Our research …
Position-guided transformer for image captioning
J Hu, Y Yang, L Yao, Y An, L Pan - Image and Vision Computing, 2022 - Elsevier
Transformer-based frameworks have shown superiorities in image captioning. However,
such frameworks are strenuous to consider geometric interrelations among visual contents …
such frameworks are strenuous to consider geometric interrelations among visual contents …
SPT: Spatial pyramid transformer for image captioning
The existing approaches to image captioning tend to adopt Transformer-based architectures
with grid features, which represent the state-of-the-art. However, the strategies are prone to …
with grid features, which represent the state-of-the-art. However, the strategies are prone to …
Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning
Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial
aspect of IC is the accurate depiction of visual relations among image objects. Visual …
aspect of IC is the accurate depiction of visual relations among image objects. Visual …