Towards robust tampered text detection in document image: New dataset and new solution
C Qu, C Liu, Y Liu, X Chen, D Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, tampered text detection in document image has attracted increasingly attention
due to its essential role on information security. However, detecting visually consistent …
due to its essential role on information security. However, detecting visually consistent …
Visual information extraction in the wild: practical dataset and end-to-end solution
Visual information extraction (VIE), which aims to simultaneously perform OCR and
information extraction in a unified framework, has drawn increasing attention due to its …
information extraction in a unified framework, has drawn increasing attention due to its …
Docparser: End-to-end ocr-free information extraction from visually rich documents
Abstract Information Extraction from visually rich documents is a challenging task that has
gained a lot of attention in recent years due to its importance in several document-control …
gained a lot of attention in recent years due to its importance in several document-control …
Instructdoc: A dataset for zero-shot generalization of visual document understanding with instructions
We study the problem of completing various visual document understanding (VDU) tasks,
eg, question answering and information extraction, on real-world documents through human …
eg, question answering and information extraction, on real-world documents through human …
TRIE++: towards end-to-end information extraction from visually rich documents
Recently, automatically extracting information from visually rich documents (eg, tickets and
resumes) has become a hot and vital research topic due to its widespread commercial …
resumes) has become a hot and vital research topic due to its widespread commercial …
Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices
X Lu, Y Chen, C Chen, H Tan, B Chen, Y Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …
significant potential to enhance various aspects of daily life, from improving communication …
Business document information extraction: Towards practical benchmarks
Abstract Information extraction from semi-structured documents is crucial for frictionless
business-to-business (B2B) communication. While machine learning problems related to …
business-to-business (B2B) communication. While machine learning problems related to …
Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods
This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …
deep learning methods are applied in this field. For the purpose of comparing the …
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role
in various real-world scenarios and applications. However, the research in VrD-NER faces …
in various real-world scenarios and applications. However, the research in VrD-NER faces …
End-to-End Compound Table Understanding with Multi-Modal Modeling
Table is a widely used data form in webpages, spreadsheets, or PDFs to organize and
present structural data. Although studies on table structure recognition have been …
present structural data. Although studies on table structure recognition have been …