Bridging by word: Image grounded vocabulary construction for visual captioning

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：107 相关文章所有 5 个版本

[PDF] wiley.com Full View

Features to text: a comprehensive survey of deep learning on semantic segmentation and image captioning

A Oluwasammi, MU Aftab, Z Qin, ST Ngo, TV Doan… - …, 2021 - Wiley Online Library

With the emergence of deep learning, computer vision has witnessed extensive
advancement and has seen immense applications in multiple domains. Specifically, image …

被引用次数：28 相关文章所有 10 个版本

[PDF] arxiv.org

Grounding'grounding'in NLP

KR Chandu, Y Bisk, AW Black - arXiv preprint arXiv:2106.02192, 2021 - arxiv.org

The NLP community has seen substantial recent interest in grounding to facilitate interaction
between language technologies and the world. However, as a community, we use the term …

被引用次数：66 相关文章所有 6 个版本

[PDF] aaai.org

Storytelling from an image stream using scene graphs

R Wang, Z Wei, P Li, Q Zhang, X Huang - Proceedings of the AAAI …, 2020 - aaai.org

Visual storytelling aims at generating a story from an image stream. Most existing methods
tend to represent images directly with the extracted high-level features, which is not intuitive …

被引用次数：71 相关文章所有 7 个版本

Image captioning based on scene graphs: A survey

J Jia, X Ding, S Pang, X Gao, X Xin, R Hu… - Expert Systems with …, 2023 - Elsevier

Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Tcic: Theme concepts learning cross language and vision for image captioning

Z Fan, Z Wei, S Wang, R Wang, Z Li, H Shan… - arXiv preprint arXiv …, 2021 - arxiv.org

Existing research for image captioning usually represents an image using a scene graph
with low-level facts (objects and relations) and fails to capture the high-level semantics. In …

被引用次数：28 相关文章所有 7 个版本

Semantic completion and filtration for image–text retrieval

S Yang, Q Li, W Li, XY Li, R Jin, B Lv, R Wang… - ACM Transactions on …, 2023 - dl.acm.org

Image–text retrieval is a vital task in computer vision and has received growing attention,
since it connects cross-modality data. It comes with the critical challenges of learning unified …

被引用次数：11 相关文章

[PDF] arxiv.org

Object-centric diagnosis of visual reasoning

J Yang, J Mao, J Wu, D Parikh, DD Cox… - arXiv preprint arXiv …, 2020 - arxiv.org

When answering questions about an image, it not only needs knowing what--understanding
the fine-grained contents (eg, objects, relationships) in the image, but also telling why …

被引用次数：18 相关文章所有 2 个版本

Structural semantic adversarial active learning for image captioning

B Zhang, L Li, L Su, S Wang, J Deng, ZJ Zha… - Proceedings of the 28th …, 2020 - dl.acm.org

Most image captioning models achieve superior performances with the help of large-scale
surprised training data, but it is prohibitively costly to label the image captions. To solve this …

被引用次数：18 相关文章

[PDF] hal.science

Review of recent deep learning based methods for image-text retrieval

J Chen, L Zhang, C Bai… - 2020 IEEE Conference on …, 2020 - ieeexplore.ieee.org

Cross-modal retrieval has drawn much attention in recent years due to the diversity and the
quantity of information data that exploded with the popularity of mobile devices and social …

被引用次数：18 相关文章所有 6 个版本