[HTML][HTML] Information extraction from scientific articles: a survey

Z Nasar, SW Jaffry, MK Malik - Scientometrics, 2018 - Springer
In last few decades, with the advent of World Wide Web (WWW), world is being overloaded
with huge data. This huge data carries potential information that once extracted, can be used …

A benchmark and evaluation for text extraction from PDF

H Bast, C Korzen - 2017 ACM/IEEE joint conference on digital …, 2017 - ieeexplore.ieee.org
Extracting the body text from a PDF document is an important but surprisingly difficult task.
The reason is that PDF is a layout-based format which specifies the fonts and positions of …

[HTML][HTML] Neural ParsCit: a deep learning-based reference string parser

A Prasad, M Kaur, MY Kan - International journal on digital libraries, 2018 - Springer
We present a deep learning approach for the core digital libraries task of parsing
bibliographic reference strings. We deploy the state-of-the-art long short-term memory …

[PDF][PDF] Assessment of Information Extraction Techniques, Models and Systems.

A Rahman, D Musleh, M Nabil, H Alubaidan… - Mathematical …, 2022 - academia.edu
The present article aims to review and evaluate the practiced and classical techniques,
tools, models, and systems concerning automatic information extraction (IE) from published …

New methods for metadata extraction from scientific literature

D Tkaczyk - arXiv preprint arXiv:1710.10201, 2017 - arxiv.org
Within the past few decades we have witnessed digital revolution, which moved scholarly
communication to electronic media and also resulted in a substantial increase in its volume …

Comparing free reference extraction pipelines

T Backes, A Iurshina, MA Shahid, P Mayr - International Journal on Digital …, 2024 - Springer
In this paper, we compare the performance of several popular pre-trained reference
extraction and segmentation toolkits combined in different pipeline configurations on three …

Parsrec: A novel meta-learning approach to recommending bibliographic reference parsers

D Tkaczyk, R Gupta, R Cinti, J Beel - arXiv preprint arXiv:1811.10369, 2018 - arxiv.org
Bibliographic reference parsers extract machine-readable metadata such as author names,
title, journal, and year from bibliographic reference strings. To extract the metadata, the …

Examination of effective features for CRF-based bibliography extraction from reference strings

D Matsuoka, M Ohta, A Takasu… - … conference on digital …, 2016 - ieeexplore.ieee.org
Metadata such as bibliographic information about documents are indispensable in the
effective use of digital libraries. In particular, the reference fields of academic papers contain …

Discovering Reliable Information Extraction Patterns with Pre-Trained Model for Text with Writing Style

C Bu, J Liu, J Liu, S Ji, H Yang - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Large-scale pre-trained models such as GPT and BERT have demonstrated remarkable
performance in information extraction tasks. However, their black-box nature poses …

Citation data-set for machine learning citation styles and entity extraction from citation strings

NM Ryan - arXiv preprint arXiv:1805.04798, 2018 - arxiv.org
Citation parsing is fundamental for search engines within academia and the protection of
intellectual property. Meticulous extraction is further needed when evaluating the similarity of …