PDFX: fully-automated PDF-to-XML conversion of scientific literature

O Kononova, T He, H Huo, A Trewartha, EA Olivetti… - Iscience, 2021 - cell.com

Research publications are the major repository of scientific knowledge. However, their
unstructured and highly heterogenous format creates a significant obstacle to large-scale …

被引用次数：109 相关文章所有 13 个版本

[PDF] frontiersin.org

How drugs get into cells: tested and testable predictions to help discriminate between transporter-mediated uptake and lipoidal bilayer diffusion

DB Kell, SG Oliver - Frontiers in pharmacology, 2014 - frontiersin.org

One approach to experimental science involves creating hypotheses, then testing them by
varying one or more independent variables, and assessing the effects of this variation on the …

被引用次数：169 相关文章所有 11 个版本

[PDF] springer.com

CERMINE: automatic extraction of structured metadata from scientific literature

D Tkaczyk, P Szostek, M Fedoryszak… - International Journal on …, 2015 - Springer

CERMINE is a comprehensive open-source system for extracting structured metadata from
scientific articles in a born-digital form. The system is based on a modular workflow, whose …

被引用次数：275 相关文章所有 8 个版本

[PDF] plos.org

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

D Westergaard, HH Stærfeldt, C Tønsberg… - PLoS computational …, 2018 - journals.plos.org

Across academia and industry, text mining has become a popular strategy for keeping up
with the rapid growth of the scientific literature. Text mining of the scientific literature has …

被引用次数：187 相关文章所有 16 个版本

[PDF] springer.com

Citation recommendation: approaches and datasets

M Färber, A Jatowt - International Journal on Digital Libraries, 2020 - Springer

Citation recommendation describes the task of recommending citations for a given text. Due
to the overload of published scientific works in recent years on the one hand, and the need …

被引用次数：110 相关文章所有 14 个版本

[PDF] pucit.edu.pk

Information extraction from scientific articles: a survey

Z Nasar, SW Jaffry, MK Malik - Scientometrics, 2018 - Springer

In last few decades, with the advent of World Wide Web (WWW), world is being overloaded
with huge data. This huge data carries potential information that once extracted, can be used …

被引用次数：126 相关文章所有 6 个版本

Sections-based bibliographic coupling for research paper recommendation

R Habib, MT Afzal - Scientometrics, 2019 - Springer

Digital libraries suffer from the problem of information overload due to immense proliferation
of research papers in journals and conference papers. This makes it challenging for …

被引用次数：84 相关文章所有 7 个版本

[PDF] acs.org

PDFDataExtractor: A tool for reading scientific text and interpreting metadata from the typeset literature in the portable document format

M Zhu, JM Cole - Journal of Chemical Information and Modeling, 2022 - ACS Publications

The layout of portable document format (PDF) files is constant to any screen, and the
metadata therein are latent, compared to mark-up languages such as HTML and XML. No …

被引用次数：26 相关文章所有 6 个版本

[PDF] iospress.com

The document components ontology (DoCO)

A Constantin, S Peroni, S Pettifer, D Shotton… - Semantic …, 2016 - content.iospress.com

The availability in machine-readable form of descriptions of the structure of documents, as
well as of the document discourse (eg the scientific discourse within scholarly articles), is …

被引用次数：110 相关文章所有 21 个版本

[PDF] uni-freiburg.de

A benchmark and evaluation for text extraction from PDF

H Bast, C Korzen - 2017 ACM/IEEE joint conference on digital …, 2017 - ieeexplore.ieee.org

Extracting the body text from a PDF document is an important but surprisingly difficult task.
The reason is that PDF is a layout-based format which specifies the fonts and positions of …

被引用次数：82 相关文章所有 7 个版本