Scholarly document information extraction using extensible features for efficient higher...

Z Nasar, SW Jaffry, MK Malik - Scientometrics, 2018 - Springer

In last few decades, with the advent of World Wide Web (WWW), world is being overloaded
with huge data. This huge data carries potential information that once extracted, can be used …

被引用次数：136 相关文章所有 6 个版本

[PDF] uni-freiburg.de

A benchmark and evaluation for text extraction from PDF

H Bast, C Korzen - 2017 ACM/IEEE joint conference on digital …, 2017 - ieeexplore.ieee.org

Extracting the body text from a PDF document is an important but surprisingly difficult task.
The reason is that PDF is a layout-based format which specifies the fonts and positions of …

被引用次数：90 相关文章所有 7 个版本

[HTML] springer.com

[HTML][HTML] Neural ParsCit: a deep learning-based reference string parser

A Prasad, M Kaur, MY Kan - International journal on digital libraries, 2018 - Springer

We present a deep learning approach for the core digital libraries task of parsing
bibliographic reference strings. We deploy the state-of-the-art long short-term memory …

被引用次数：76 相关文章所有 10 个版本

[PDF] academia.edu

[PDF][PDF] Assessment of Information Extraction Techniques, Models and Systems.

A Rahman, D Musleh, M Nabil, H Alubaidan… - Mathematical …, 2022 - academia.edu

The present article aims to review and evaluate the practiced and classical techniques,
tools, models, and systems concerning automatic information extraction (IE) from published …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

New methods for metadata extraction from scientific literature

D Tkaczyk - arXiv preprint arXiv:1710.10201, 2017 - arxiv.org

Within the past few decades we have witnessed digital revolution, which moved scholarly
communication to electronic media and also resulted in a substantial increase in its volume …

被引用次数：20 相关文章所有 3 个版本

Comparing free reference extraction pipelines

T Backes, A Iurshina, MA Shahid, P Mayr - International Journal on Digital …, 2024 - Springer

In this paper, we compare the performance of several popular pre-trained reference
extraction and segmentation toolkits combined in different pipeline configurations on three …

被引用次数：1 相关文章

[PDF] arxiv.org

Parsrec: A novel meta-learning approach to recommending bibliographic reference parsers

D Tkaczyk, R Gupta, R Cinti, J Beel - arXiv preprint arXiv:1811.10369, 2018 - arxiv.org

Bibliographic reference parsers extract machine-readable metadata such as author names,
title, journal, and year from bibliographic reference strings. To extract the metadata, the …

被引用次数：10 相关文章所有 6 个版本

Examination of effective features for CRF-based bibliography extraction from reference strings

D Matsuoka, M Ohta, A Takasu… - … conference on digital …, 2016 - ieeexplore.ieee.org

Metadata such as bibliographic information about documents are indispensable in the
effective use of digital libraries. In particular, the reference fields of academic papers contain …

被引用次数：10 相关文章所有 4 个版本

Discovering Reliable Information Extraction Patterns with Pre-Trained Model for Text with Writing Style

C Bu, J Liu, J Liu, S Ji, H Yang - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Large-scale pre-trained models such as GPT and BERT have demonstrated remarkable
performance in information extraction tasks. However, their black-box nature poses …

[PDF] arxiv.org

Citation data-set for machine learning citation styles and entity extraction from citation strings

NM Ryan - arXiv preprint arXiv:1805.04798, 2018 - arxiv.org

Citation parsing is fundamental for search engines within academia and the protection of
intellectual property. Meticulous extraction is further needed when evaluating the similarity of …

被引用次数：6 相关文章所有 2 个版本