M6doc: A large-scale multi-format, multi-type, multi-layout, multi-language, multi-annotation category dataset for modern document layout analysis
Document layout analysis is a crucial prerequisite for document understanding, including
document retrieval and conversion. Most public datasets currently contain only PDF …
document retrieval and conversion. Most public datasets currently contain only PDF …
Foreground and text-lines aware document image rectification
This paper aims at the distorted document image rectification problem, the objective to
eliminate the geometric distortion in the document images and realize document …
eliminate the geometric distortion in the document images and realize document …
Deep unrestricted document image rectification
In recent years, tremendous efforts have been made on document image rectification, but
existing advanced algorithms are limited to processing restricted document images, ie, the …
existing advanced algorithms are limited to processing restricted document images, ie, the …
Layout-aware single-image document flattening
Single image rectification of document deformation is a challenging task. Although some
recent deep learning-based methods have attempted to solve this problem, they cannot …
recent deep learning-based methods have attempted to solve this problem, they cannot …
DocScanner: Robust document image rectification with progressive learning
Compared with flatbed scanners, portable smartphones provide more convenience for
physical document digitization. However, such digitized documents are often distorted due …
physical document digitization. However, such digitized documents are often distorted due …
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Document image restoration is a crucial aspect of Document AI systems as the quality of
document images significantly influences the overall performance. Prevailing methods …
document images significantly influences the overall performance. Prevailing methods …
Template-guided illumination correction for document images with imperfect geometric reconstruction
F Hertlein, A Naumann - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
To facilitate the transition into the digital era, it is necessary to digitize printed documents
such as forms and invoices. Due to the presence of diverse lighting conditions and …
such as forms and invoices. Due to the presence of diverse lighting conditions and …
Matadoc: margin and text aware document dewarping for arbitrary boundary
Document dewarping from a distorted camera-captured image is of great value for OCR and
document understanding. The document boundary plays an important role which is more …
document understanding. The document boundary plays an important role which is more …
Appearance enhancement for camera-captured document images in the wild
Camera-captured document images usually suffer from various appearance degradations,
which hamper the clarity of content and preclude subsequent analysis and recognition …
which hamper the clarity of content and preclude subsequent analysis and recognition …
Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping
F Hertlein, A Naumann, P Philipp - International Journal on Document …, 2023 - Springer
Numerous business workflows involve printed forms, such as invoices or receipts, which are
often manually digitalized to persistently search or store the data. As hardware scanners are …
often manually digitalized to persistently search or store the data. As hardware scanners are …