Spatial dual-modality graph reasoning for key information extraction

C Qu, C Liu, Y Liu, X Chen, D Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, tampered text detection in document image has attracted increasingly attention
due to its essential role on information security. However, detecting visually consistent …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Visual information extraction in the wild: practical dataset and end-to-end solution

J Kuang, W Hua, D Liang, M Yang, D Jiang… - … on Document Analysis …, 2023 - Springer

Visual information extraction (VIE), which aims to simultaneously perform OCR and
information extraction in a unified framework, has drawn increasing attention due to its …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Docparser: End-to-end ocr-free information extraction from visually rich documents

M Dhouib, G Bettaieb, A Shabou - International Conference on Document …, 2023 - Springer

Abstract Information Extraction from visually rich documents is a challenging task that has
gained a lot of attention in recent years due to its importance in several document-control …

被引用次数：22 相关文章所有 5 个版本

[PDF] aaai.org

Instructdoc: A dataset for zero-shot generalization of visual document understanding with instructions

R Tanaka, T Iki, K Nishida, K Saito… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

We study the problem of completing various visual document understanding (VDU) tasks,
eg, question answering and information extraction, on real-world documents through human …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

TRIE++: towards end-to-end information extraction from visually rich documents

Z Cheng, P Zhang, C Li, Q Liang, Y Xu, P Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, automatically extracting information from visually rich documents (eg, tickets and
resumes) has become a hot and vital research topic due to its widespread commercial …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices

X Lu, Y Chen, C Chen, H Tan, B Chen, Y Xie… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Business document information extraction: Towards practical benchmarks

M Skalický, Š Šimsa, M Uřičář, M Šulc - International Conference of the …, 2022 - Springer

Abstract Information extraction from semi-structured documents is crucial for frictionless
business-to-business (B2B) communication. While machine learning problems related to …

被引用次数：13 相关文章所有 8 个版本

Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods

H Gbada, K Kalti, MA Mahjoub - International Journal on Document …, 2024 - Springer

This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …

被引用次数：1 相关文章

[PDF] acm.org

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

Y Tu, C Zhang, Y Guo, H Chen, J Tang, H Zhu… - Proceedings of the …, 2024 - dl.acm.org

The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role
in various real-world scenarios and applications. However, the research in VrD-NER faces …

被引用次数：1 相关文章所有 6 个版本

[PDF] researchgate.net

End-to-End Compound Table Understanding with Multi-Modal Modeling

Z Li, Y Li, Q Liang, P Li, Z Cheng, Y Niu, S Pu… - Proceedings of the 30th …, 2022 - dl.acm.org

Table is a widely used data form in webpages, spreadsheets, or PDFs to organize and
present structural data. Although studies on table structure recognition have been …

被引用次数：5 相关文章所有 3 个版本