Robustness of Structured Data Extraction from In-Plane Rotated Documents Using Multi-Modal Large Language Models (LLM)
A Biswas, W Talukdar - Journal of Artificial Intelligence Research, 2024 - arxiv.org
Multi-modal large language models (LLMs) have shown remarkable performance in various
natural language processing tasks, including data extraction from documents. However, the …
natural language processing tasks, including data extraction from documents. However, the …
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine
learning systems and has shaped the field of OOD detection. Meanwhile, several other …
learning systems and has shaped the field of OOD detection. Meanwhile, several other …
Show-o: One single transformer to unify multimodal understanding and generation
We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …
[HTML][HTML] Deep learning for the harmonization of structural MRI scans: a survey
Medical imaging datasets for research are frequently collected from multiple imaging centers
using different scanners, protocols, and settings. These variations affect data consistency …
using different scanners, protocols, and settings. These variations affect data consistency …
MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era
Despite the recent advancements in Multi-modal Large Language Models (MLLMs),
understanding inter-object relations, ie, interactions or associations between distinct objects …
understanding inter-object relations, ie, interactions or associations between distinct objects …
Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models
Hallucination issues persistently plagued current multimodal large language models
(MLLMs). While existing research primarily focuses on object-level or attribute-level …
(MLLMs). While existing research primarily focuses on object-level or attribute-level …
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models
Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications,
including medical visual question answering and imaging report generation. While these …
including medical visual question answering and imaging report generation. While these …
Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to
hallucinate factually inaccurate information. In this work, we address object hallucinations in …
hallucinate factually inaccurate information. In this work, we address object hallucinations in …
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate
a backbone LLM with a pre-trained feature encoder for downstream tasks. The major …
a backbone LLM with a pre-trained feature encoder for downstream tasks. The major …
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models
Despite the rapid progress and outstanding performance of Large Vision-Language Models
(LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, ie, LVLMs …
(LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, ie, LVLMs …