From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
AI robustness: a human-centered perspective on technological challenges and opportunities
Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness
remains elusive and constitutes a key issue that impedes large-scale adoption. Besides …
remains elusive and constitutes a key issue that impedes large-scale adoption. Besides …
Did the model understand the question?
We analyze state-of-the-art deep learning models for three tasks: question answering on (1)
images,(2) tables, and (3) passages of text. Using the notion of\emph {attribution}(word …
images,(2) tables, and (3) passages of text. Using the notion of\emph {attribution}(word …
The dawn of quantum natural language processing
In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …
based on deep-learning models with quantum computing. We successfully train a quantum …
Towards fine-grained citation evaluation in generated text: A comparative analysis of faithfulness metrics
Large language models (LLMs) often produce unsupported or unverifiable content, known
as" hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations …
as" hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations …
Optimizing numerical estimation and operational efficiency in the legal domain through large language models
The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with
challenges in delivering timely and accurate information to clients, particularly concerning …
challenges in delivering timely and accurate information to clients, particularly concerning …
Expert-defined keywords improve interpretability of retinal image captioning
Automatic machine learning-based (ML-based) medical report generation systems for retinal
images suffer from a relative lack of interpretability. Hence, such ML-based systems are still …
images suffer from a relative lack of interpretability. Hence, such ML-based systems are still …
A novel evaluation framework for image2text generation
Evaluating the quality of automatically generated image descriptions is challenging,
requiring metrics that capture various aspects such as grammaticality, coverage …
requiring metrics that capture various aspects such as grammaticality, coverage …
Query-controllable video summarization
When video collections become huge, how to explore both within and across videos
efficiently is challenging. Video summarization is one of the ways to tackle this issue …
efficiently is challenging. Video summarization is one of the ways to tackle this issue …
Deepopht: medical report generation for retinal images via deep models and visual explanation
In this work, we propose an AI-based method that intends to improve the conventional retinal
disease treatment procedure and help ophthalmologists increase diagnosis efficiency and …
disease treatment procedure and help ophthalmologists increase diagnosis efficiency and …