Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

Q Zhang, VSJ Huang, B Wang, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures

J Huang, H Chen, F Yu, W Lu - ACM Computing Surveys, 2024 - dl.acm.org
Tables and figures are usually used to present information in a structured and visual way in
scientific documents. Understanding the tables and figures in scientific documents is …

Lineformer: Line chart data extraction using instance segmentation

J Lal, A Mitkari, M Bhosale, D Doermann - International Conference on …, 2023 - Springer
Data extraction from line-chart images is an essential component of the automated
document understanding process, as line charts are a ubiquitous data visualization format …

Swin-chart: An efficient approach for chart classification

A Dhote, M Javed, DS Doermann - Pattern Recognition Letters, 2024 - Elsevier
Charts are a visualization tool used in scientific documents to facilitate easy comprehension
of complex relationships underlying data and experiments. Researchers use various chart …

SciOL and MuLMS-Img: Introducing A Large-Scale Multimodal Scientific Dataset and Models for Image-Text Tasks in the Scientific Domain

T Tarsi, H Adel, JH Metzen, D Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In scientific publications, a substantial part of the information is expressed via figures
containing images and diagrams. Hence, the retrieval of relevant figures given a natural …

Hierarchical Recognizing Vector Graphics and A New Chart-based Vector Graphics Dataset

S Dou, X Jiang, L Liu, L Ying, C Shan… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
The conventional approach to image recognition has been based on raster graphics, which
can suffer from aliasing and information loss when scaled up or down. In this paper, we …

C3E: A framework for chart classification and content extraction

MS Kanroo, HS Kawoosa, K Rana, P Goyal - Computers and Electrical …, 2025 - Elsevier
Incorporating charts into technical documents enhances richness by simplifying complex
data representation and improving comprehension. However, automated chart content …

A survey and approach to chart classification

A Dhote, M Javed, DS Doermann - International Conference on Document …, 2023 - Springer
Charts represent an essential source of visual information in documents and facilitate a
deep understanding and interpretation of information typically conveyed numerically. In the …

Text Role Classification in Scientific Charts Using Multimodal Transformers

HJ Kim, N Lell, A Scherp - … on Applications of Natural Language to …, 2024 - Springer
Text role classification involves classifying the semantic role of textual elements within
scientific charts. We propose to finetune the multimodal document layout analysis models …

SpaDen: Sparse and Dense Keypoint Estimation for Real-World Chart Understanding

S Ahmed, P Yan, D Doermann, S Setlur… - … on Document Analysis …, 2023 - Springer
We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes
images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct …