查看文章

sciencedirect.com 中的 [HTML]

Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks

作者

Mohammad Alsharid, Yifan Cai, Harshita Sharma, Lior Drukker, Aris T Papageorghiou, J Alison Noble

发表日期

2022/11/1

期刊

Medical Image Analysis

卷号

页码范围

102630

出版商

Elsevier

简介

In this work, we present a novel gaze-assisted natural language processing (NLP)-based video captioning model to describe routine second-trimester fetal ultrasound scan videos in a vocabulary of spoken sonography. The primary novelty of our multi-modal approach is that the learned video captioning model is built using a combination of ultrasound video, tracked gaze and textual transcriptions from speech recordings. The textual captions that describe the spatio-temporal scan video content are learnt from sonographer speech recordings. The generation of captions is assisted by sonographer gaze-tracking information reflecting their visual attention while performing live-imaging and interpreting a frozen image. To evaluate the effect of adding, or withholding, different forms of gaze on the video model, we compare spatio-temporal deep networks trained using three multi-modal configurations, namely: (1) a gaze …

引用总数

被引用次数：11

202320245 6

学术搜索中的文章

Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks

M Alsharid, Y Cai, H Sharma, L Drukker… - Medical Image Analysis, 2022

被引用次数：11 相关文章所有 5 个版本