Towards robust tampered text detection in document image: New dataset and new solution

C Qu, C Liu, Y Liu, X Chen, D Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, tampered text detection in document image has attracted increasingly attention
due to its essential role on information security. However, detecting visually consistent …

Visual information extraction in the wild: practical dataset and end-to-end solution

J Kuang, W Hua, D Liang, M Yang, D Jiang… - … on Document Analysis …, 2023 - Springer
Visual information extraction (VIE), which aims to simultaneously perform OCR and
information extraction in a unified framework, has drawn increasing attention due to its …

Docparser: End-to-end ocr-free information extraction from visually rich documents

M Dhouib, G Bettaieb, A Shabou - International Conference on Document …, 2023 - Springer
Abstract Information Extraction from visually rich documents is a challenging task that has
gained a lot of attention in recent years due to its importance in several document-control …

Instructdoc: A dataset for zero-shot generalization of visual document understanding with instructions

R Tanaka, T Iki, K Nishida, K Saito… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
We study the problem of completing various visual document understanding (VDU) tasks,
eg, question answering and information extraction, on real-world documents through human …

TRIE++: towards end-to-end information extraction from visually rich documents

Z Cheng, P Zhang, C Li, Q Liang, Y Xu, P Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, automatically extracting information from visually rich documents (eg, tickets and
resumes) has become a hot and vital research topic due to its widespread commercial …

Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices

X Lu, Y Chen, C Chen, H Tan, B Chen, Y Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …

Business document information extraction: Towards practical benchmarks

M Skalický, Š Šimsa, M Uřičář, M Šulc - International Conference of the …, 2022 - Springer
Abstract Information extraction from semi-structured documents is crucial for frictionless
business-to-business (B2B) communication. While machine learning problems related to …

Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods

H Gbada, K Kalti, MA Mahjoub - International Journal on Document …, 2024 - Springer
This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

Y Tu, C Zhang, Y Guo, H Chen, J Tang, H Zhu… - Proceedings of the …, 2024 - dl.acm.org
The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role
in various real-world scenarios and applications. However, the research in VrD-NER faces …

End-to-End Compound Table Understanding with Multi-Modal Modeling

Z Li, Y Li, Q Liang, P Li, Z Cheng, Y Niu, S Pu… - Proceedings of the 30th …, 2022 - dl.acm.org
Table is a widely used data form in webpages, spreadsheets, or PDFs to organize and
present structural data. Although studies on table structure recognition have been …