A novel framework for robustness analysis of visual qa models

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：22 相关文章所有 2 个版本

[PDF] acm.org

AI robustness: a human-centered perspective on technological challenges and opportunities

A Tocchetti, L Corti, A Balayn, M Yurrita… - ACM Computing …, 2022 - dl.acm.org

Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness
remains elusive and constitutes a key issue that impedes large-scale adoption. Besides …

被引用次数：15 相关文章所有 4 个版本

[PDF] aclanthology.org

Did the model understand the question?

PK Mudrakarta, A Taly, M Sundararajan… - arXiv preprint arXiv …, 2018 - arxiv.org

We analyze state-of-the-art deep learning models for three tasks: question answering on (1)
images,(2) tables, and (3) passages of text. Using the notion of\emph {attribution}(word …

被引用次数：221 相关文章所有 6 个版本

[PDF] arxiv.org

The dawn of quantum natural language processing

R Di Sipio, JH Huang, SYC Chen… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …

被引用次数：100 相关文章所有 6 个版本

[PDF] arxiv.org

Towards fine-grained citation evaluation in generated text: A comparative analysis of faithfulness metrics

W Zhang, M Aliannejadi, Y Yuan, J Pei… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) often produce unsupported or unverifiable content, known
as" hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations …

被引用次数：13 相关文章所有 6 个版本

[PDF] acm.org

Optimizing numerical estimation and operational efficiency in the legal domain through large language models

JH Huang, CC Yang, Y Shen, AM Pacces… - Proceedings of the 33rd …, 2024 - dl.acm.org

The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with
challenges in delivering timely and accurate information to clients, particularly concerning …

被引用次数：9 相关文章所有 3 个版本

[PDF] thecvf.com

Expert-defined keywords improve interpretability of retinal image captioning

TW Wu, JH Huang, J Lin… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Automatic machine learning-based (ML-based) medical report generation systems for retinal
images suffer from a relative lack of interpretability. Hence, such ML-based systems are still …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

A novel evaluation framework for image2text generation

JH Huang, H Zhu, Y Shen, S Rudinac… - arXiv preprint arXiv …, 2024 - arxiv.org

Evaluating the quality of automatically generated image descriptions is challenging,
requiring metrics that capture various aspects such as grammaticality, coverage …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Query-controllable video summarization

JH Huang, M Worring - … of the 2020 International Conference on …, 2020 - dl.acm.org

When video collections become huge, how to explore both within and across videos
efficiently is challenging. Video summarization is one of the ways to tackle this issue …

被引用次数：65 相关文章所有 6 个版本

[PDF] thecvf.com

Deepopht: medical report generation for retinal images via deep models and visual explanation

JH Huang, CHH Yang, F Liu, M Tian… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this work, we propose an AI-based method that intends to improve the conventional retinal
disease treatment procedure and help ophthalmologists increase diagnosis efficiency and …

被引用次数：61 相关文章所有 11 个版本