Form2Seq: A framework for higher-order form structure extraction

M Aggarwal, H Gupta, M Sarkar… - arXiv preprint arXiv …, 2021 - arxiv.org
Document structure extraction has been a widely researched area for decades with recent
works performing it as a semantic segmentation task over document images using fully …

Mathematical formula identification in PDF documents

X Lin, L Gao, Z Tang, X Lin, X Hu - … international conference on …, 2011 - ieeexplore.ieee.org
Recognizing mathematical expressions in PDF documents is a new and important field in
document analysis. It is quite different from extracting mathematical expressions in image …

Mathematical formula identification and performance evaluation in PDF documents

X Lin, L Gao, Z Tang, J Baker, V Sorge - International Journal on …, 2014 - Springer
An important initial step of mathematical formula recognition is to correctly identify the
location of formulae within documents. Previous work in this area has traditionally focused …

Document structure extraction using prior based high resolution hierarchical semantic segmentation

M Sarkar, M Aggarwal, A Jain, H Gupta… - … on Computer Vision, 2020 - Springer
Abstract Structure extraction from document images has been a long-standing research
topic due to its high impact on a wide range of practical applications. In this paper, we share …

Multi-modal association based grouping for form structure extraction

M Aggarwal, M Sarkar, H Gupta… - Proceedings of the …, 2020 - openaccess.thecvf.com
Document structure extraction has been a widely researched area for decades. Recent work
in this direction has been deep learning-based, mostly focusing on extracting structure using …

A supervised learning approach for heading detection

SS Budhiraja, V Mago - Expert systems, 2020 - Wiley Online Library
As the popularity of the portable document format (PDF) file format increases, research that
facilitates PDF text analysis or extraction is necessary. Heading detection is a crucial …

Extraction of math expressions from PDF documents based on unsupervised modeling of fonts

Z Wang, D Beyette, J Lin, JC Liu - … International Conference on …, 2019 - ieeexplore.ieee.org
This paper proposes a multi-stage architecture to extract math expressions (ME) from PDF
documents based on font analysis. The unsupervised algorithm starts from symbol level …

XCDF: a canonical and structured document format

JL Bloechle, M Rigamonti, K Hadjar, D Lalanne… - … Analysis Systems VII …, 2006 - Springer
Accessing the structured content of PDF document is a difficult task, requiring pre-
processing and reverse engineering techniques. In this paper, we first present different …

[PDF][PDF] Transformation of PDF textbooks into intelligent educational resources

I Alpizar-Chacon, M van der Hart, ZS Wiersma… - iTextbooks 2020, 2020 - ceur-ws.org
The paper presents Intextbooks-the system for automated conversion of PDF-based
textbooks into intelligent educational Web resources. The papers focuses on the new …

Bigram label regularization to reduce over-segmentation on inline math expression detection

X Wang, Z Wang, JC Liu - 2019 International Conference on …, 2019 - ieeexplore.ieee.org
Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext
sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the …