Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Ic3: Image captioning by committee consensus

DM Chan, A Myers, S Vijayanarasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org
If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …

Distribution aware metrics for conditional natural language generation

DM Chan, Y Ni, DA Ross, S Vijayanarasimhan… - arXiv preprint arXiv …, 2022 - arxiv.org
Traditional automated metrics for evaluating conditional natural language generation use
pairwise comparisons between a single generated text and the best-matching gold-standard …

Analyzing The Language of Visual Tokens

DM Chan, R Corona, J Park, CJ Cho, Y Bai… - arXiv preprint arXiv …, 2024 - arxiv.org
With the introduction of transformer-based models for vision and language tasks, such as
LLaVA and Chameleon, there has been renewed interest in the discrete tokenized …

Beyond Coarse-Grained Matching in Video-Text Retrieval

A Chen, H Doughty, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-video retrieval has seen significant advancements, yet the ability of models to
discern subtle differences in captions still requires verification. In this paper, we introduce a …

Towards Modeling the Implicit Ontologies of Natural Language for Vision-Language Benchmarks

J Feinglass - 2024 - search.proquest.com
Abstract Language is the predominant means by which humans communicate and
accumulate knowledge acquired through our senses, with vision being the most valued of …

Understanding, Building, and Evaluating Models for Context Aware Conditional Natural Language Generation

DM Chan - 2024 - search.proquest.com
If you ask a human to describe an image, they might do so in a thousand different ways.
Each of these descriptions depends not only on the image but also on a rich tapestry of …

[引用][C] 深度学习图像描述方法分析与展望

赵永强, 金芝, 张峰, 赵海燕, 陶政为, 豆乘风, 徐新海… - 中国图象图形学报, 2023