Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
The dawn of quantum natural language processing
In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …
based on deep-learning models with quantum computing. We successfully train a quantum …
Optimizing numerical estimation and operational efficiency in the legal domain through large language models
The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with
challenges in delivering timely and accurate information to clients, particularly concerning …
challenges in delivering timely and accurate information to clients, particularly concerning …
Expert-defined keywords improve interpretability of retinal image captioning
Automatic machine learning-based (ML-based) medical report generation systems for retinal
images suffer from a relative lack of interpretability. Hence, such ML-based systems are still …
images suffer from a relative lack of interpretability. Hence, such ML-based systems are still …
A novel evaluation framework for image2text generation
Evaluating the quality of automatically generated image descriptions is challenging,
requiring metrics that capture various aspects such as grammaticality, coverage …
requiring metrics that capture various aspects such as grammaticality, coverage …
Query-controllable video summarization
When video collections become huge, how to explore both within and across videos
efficiently is challenging. Video summarization is one of the ways to tackle this issue …
efficiently is challenging. Video summarization is one of the ways to tackle this issue …
Deepopht: medical report generation for retinal images via deep models and visual explanation
In this work, we propose an AI-based method that intends to improve the conventional retinal
disease treatment procedure and help ophthalmologists increase diagnosis efficiency and …
disease treatment procedure and help ophthalmologists increase diagnosis efficiency and …
Gpt2mvs: Generative pre-trained transformer-2 for multi-modal video summarization
Traditional video summarization methods generate fixed video representations regardless of
user interest. Therefore such methods limit users' expectations in content search and …
user interest. Therefore such methods limit users' expectations in content search and …
Causalainer: Causal explainer for automatic video summarization
The goal of video summarization is to automatically shorten videos such that it conveys the
overall story without losing relevant information. In many application scenarios, improper …
overall story without losing relevant information. In many application scenarios, improper …