Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Features to text: a comprehensive survey of deep learning on semantic segmentation and image captioning
A Oluwasammi, MU Aftab, Z Qin, ST Ngo, TV Doan… - …, 2021 - Wiley Online Library
With the emergence of deep learning, computer vision has witnessed extensive
advancement and has seen immense applications in multiple domains. Specifically, image …
advancement and has seen immense applications in multiple domains. Specifically, image …
Grounding'grounding'in NLP
The NLP community has seen substantial recent interest in grounding to facilitate interaction
between language technologies and the world. However, as a community, we use the term …
between language technologies and the world. However, as a community, we use the term …
Storytelling from an image stream using scene graphs
Visual storytelling aims at generating a story from an image stream. Most existing methods
tend to represent images directly with the extracted high-level features, which is not intuitive …
tend to represent images directly with the extracted high-level features, which is not intuitive …
Image captioning based on scene graphs: A survey
J Jia, X Ding, S Pang, X Gao, X Xin, R Hu… - Expert Systems with …, 2023 - Elsevier
Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …
performance, there is still a significant gap between human and machine performance in …
Tcic: Theme concepts learning cross language and vision for image captioning
Existing research for image captioning usually represents an image using a scene graph
with low-level facts (objects and relations) and fails to capture the high-level semantics. In …
with low-level facts (objects and relations) and fails to capture the high-level semantics. In …
Semantic completion and filtration for image–text retrieval
S Yang, Q Li, W Li, XY Li, R Jin, B Lv, R Wang… - ACM Transactions on …, 2023 - dl.acm.org
Image–text retrieval is a vital task in computer vision and has received growing attention,
since it connects cross-modality data. It comes with the critical challenges of learning unified …
since it connects cross-modality data. It comes with the critical challenges of learning unified …
Object-centric diagnosis of visual reasoning
When answering questions about an image, it not only needs knowing what--understanding
the fine-grained contents (eg, objects, relationships) in the image, but also telling why …
the fine-grained contents (eg, objects, relationships) in the image, but also telling why …
Structural semantic adversarial active learning for image captioning
Most image captioning models achieve superior performances with the help of large-scale
surprised training data, but it is prohibitively costly to label the image captions. To solve this …
surprised training data, but it is prohibitively costly to label the image captions. To solve this …
Review of recent deep learning based methods for image-text retrieval
Cross-modal retrieval has drawn much attention in recent years due to the diversity and the
quantity of information data that exploded with the popularity of mobile devices and social …
quantity of information data that exploded with the popularity of mobile devices and social …