Form2Seq: A framework for higher-order form structure extraction
Document structure extraction has been a widely researched area for decades with recent
works performing it as a semantic segmentation task over document images using fully …
works performing it as a semantic segmentation task over document images using fully …
Mathematical formula identification in PDF documents
Recognizing mathematical expressions in PDF documents is a new and important field in
document analysis. It is quite different from extracting mathematical expressions in image …
document analysis. It is quite different from extracting mathematical expressions in image …
Mathematical formula identification and performance evaluation in PDF documents
An important initial step of mathematical formula recognition is to correctly identify the
location of formulae within documents. Previous work in this area has traditionally focused …
location of formulae within documents. Previous work in this area has traditionally focused …
Document structure extraction using prior based high resolution hierarchical semantic segmentation
Abstract Structure extraction from document images has been a long-standing research
topic due to its high impact on a wide range of practical applications. In this paper, we share …
topic due to its high impact on a wide range of practical applications. In this paper, we share …
Multi-modal association based grouping for form structure extraction
Document structure extraction has been a widely researched area for decades. Recent work
in this direction has been deep learning-based, mostly focusing on extracting structure using …
in this direction has been deep learning-based, mostly focusing on extracting structure using …
A supervised learning approach for heading detection
SS Budhiraja, V Mago - Expert systems, 2020 - Wiley Online Library
As the popularity of the portable document format (PDF) file format increases, research that
facilitates PDF text analysis or extraction is necessary. Heading detection is a crucial …
facilitates PDF text analysis or extraction is necessary. Heading detection is a crucial …
Extraction of math expressions from PDF documents based on unsupervised modeling of fonts
This paper proposes a multi-stage architecture to extract math expressions (ME) from PDF
documents based on font analysis. The unsupervised algorithm starts from symbol level …
documents based on font analysis. The unsupervised algorithm starts from symbol level …
XCDF: a canonical and structured document format
Accessing the structured content of PDF document is a difficult task, requiring pre-
processing and reverse engineering techniques. In this paper, we first present different …
processing and reverse engineering techniques. In this paper, we first present different …
[PDF][PDF] Transformation of PDF textbooks into intelligent educational resources
I Alpizar-Chacon, M van der Hart, ZS Wiersma… - iTextbooks 2020, 2020 - ceur-ws.org
The paper presents Intextbooks-the system for automated conversion of PDF-based
textbooks into intelligent educational Web resources. The papers focuses on the new …
textbooks into intelligent educational Web resources. The papers focuses on the new …
Bigram label regularization to reduce over-segmentation on inline math expression detection
Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext
sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the …
sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the …