Conversion of PDF documents into HTML: a case study of document image analysis

M Aggarwal, H Gupta, M Sarkar… - arXiv preprint arXiv …, 2021 - arxiv.org

Document structure extraction has been a widely researched area for decades with recent
works performing it as a semantic segmentation task over document images using fully …

被引用次数：24 相关文章所有 4 个版本

[PDF] researchgate.net

Mathematical formula identification in PDF documents

X Lin, L Gao, Z Tang, X Lin, X Hu - … international conference on …, 2011 - ieeexplore.ieee.org

Recognizing mathematical expressions in PDF documents is a new and important field in
document analysis. It is quite different from extracting mathematical expressions in image …

被引用次数：63 相关文章所有 5 个版本

Mathematical formula identification and performance evaluation in PDF documents

X Lin, L Gao, Z Tang, J Baker, V Sorge - International Journal on …, 2014 - Springer

An important initial step of mathematical formula recognition is to correctly identify the
location of formulae within documents. Previous work in this area has traditionally focused …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

Document structure extraction using prior based high resolution hierarchical semantic segmentation

M Sarkar, M Aggarwal, A Jain, H Gupta… - … on Computer Vision, 2020 - Springer

Abstract Structure extraction from document images has been a long-standing research
topic due to its high impact on a wide range of practical applications. In this paper, we share …

被引用次数：20 相关文章所有 7 个版本

[PDF] thecvf.com

Multi-modal association based grouping for form structure extraction

M Aggarwal, M Sarkar, H Gupta… - Proceedings of the …, 2020 - openaccess.thecvf.com

Document structure extraction has been a widely researched area for decades. Recent work
in this direction has been deep learning-based, mostly focusing on extracting structure using …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

A supervised learning approach for heading detection

SS Budhiraja, V Mago - Expert systems, 2020 - Wiley Online Library

As the popularity of the portable document format (PDF) file format increases, research that
facilitates PDF text analysis or extraction is necessary. Heading detection is a crucial …

被引用次数：20 相关文章所有 5 个版本

Extraction of math expressions from PDF documents based on unsupervised modeling of fonts

Z Wang, D Beyette, J Lin, JC Liu - … International Conference on …, 2019 - ieeexplore.ieee.org

This paper proposes a multi-stage architecture to extract math expressions (ME) from PDF
documents based on font analysis. The unsupervised algorithm starts from symbol level …

被引用次数：13 相关文章所有 2 个版本

[PDF] academia.edu

XCDF: a canonical and structured document format

JL Bloechle, M Rigamonti, K Hadjar, D Lalanne… - … Analysis Systems VII …, 2006 - Springer

Accessing the structured content of PDF document is a difficult task, requiring pre-
processing and reverse engineering techniques. In this paper, we first present different …

被引用次数：45 相关文章所有 15 个版本

[PDF] ceur-ws.org

[PDF][PDF] Transformation of PDF textbooks into intelligent educational resources

I Alpizar-Chacon, M van der Hart, ZS Wiersma… - iTextbooks 2020, 2020 - ceur-ws.org

The paper presents Intextbooks-the system for automated conversion of PDF-based
textbooks into intelligent educational Web resources. The papers focuses on the new …

被引用次数：9 相关文章所有 4 个版本

Bigram label regularization to reduce over-segmentation on inline math expression detection

X Wang, Z Wang, JC Liu - 2019 International Conference on …, 2019 - ieeexplore.ieee.org

Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext
sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the …

被引用次数：11 相关文章所有 2 个版本