Deep image captioning: A review of methods, trends and future challenges
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …
content of images in human language, which requires to model semantic relationship …
Ic3: Image captioning by committee consensus
If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …
Traditionally, image captioning models are trained to generate a single" best"(most like a …
Distribution aware metrics for conditional natural language generation
Traditional automated metrics for evaluating conditional natural language generation use
pairwise comparisons between a single generated text and the best-matching gold-standard …
pairwise comparisons between a single generated text and the best-matching gold-standard …
Analyzing The Language of Visual Tokens
With the introduction of transformer-based models for vision and language tasks, such as
LLaVA and Chameleon, there has been renewed interest in the discrete tokenized …
LLaVA and Chameleon, there has been renewed interest in the discrete tokenized …
Beyond Coarse-Grained Matching in Video-Text Retrieval
Text-to-video retrieval has seen significant advancements, yet the ability of models to
discern subtle differences in captions still requires verification. In this paper, we introduce a …
discern subtle differences in captions still requires verification. In this paper, we introduce a …
Towards Modeling the Implicit Ontologies of Natural Language for Vision-Language Benchmarks
J Feinglass - 2024 - search.proquest.com
Abstract Language is the predominant means by which humans communicate and
accumulate knowledge acquired through our senses, with vision being the most valued of …
accumulate knowledge acquired through our senses, with vision being the most valued of …
Understanding, Building, and Evaluating Models for Context Aware Conditional Natural Language Generation
DM Chan - 2024 - search.proquest.com
If you ask a human to describe an image, they might do so in a thousand different ways.
Each of these descriptions depends not only on the image but also on a rich tapestry of …
Each of these descriptions depends not only on the image but also on a rich tapestry of …