[HTML][HTML] Opportunities and challenges of text mining in materials research

O Kononova, T He, H Huo, A Trewartha, EA Olivetti… - Iscience, 2021 - cell.com
Research publications are the major repository of scientific knowledge. However, their
unstructured and highly heterogenous format creates a significant obstacle to large-scale …

How drugs get into cells: tested and testable predictions to help discriminate between transporter-mediated uptake and lipoidal bilayer diffusion

DB Kell, SG Oliver - Frontiers in pharmacology, 2014 - frontiersin.org
One approach to experimental science involves creating hypotheses, then testing them by
varying one or more independent variables, and assessing the effects of this variation on the …

CERMINE: automatic extraction of structured metadata from scientific literature

D Tkaczyk, P Szostek, M Fedoryszak… - International Journal on …, 2015 - Springer
CERMINE is a comprehensive open-source system for extracting structured metadata from
scientific articles in a born-digital form. The system is based on a modular workflow, whose …

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

D Westergaard, HH Stærfeldt, C Tønsberg… - PLoS computational …, 2018 - journals.plos.org
Across academia and industry, text mining has become a popular strategy for keeping up
with the rapid growth of the scientific literature. Text mining of the scientific literature has …

Citation recommendation: approaches and datasets

M Färber, A Jatowt - International Journal on Digital Libraries, 2020 - Springer
Citation recommendation describes the task of recommending citations for a given text. Due
to the overload of published scientific works in recent years on the one hand, and the need …

Information extraction from scientific articles: a survey

Z Nasar, SW Jaffry, MK Malik - Scientometrics, 2018 - Springer
In last few decades, with the advent of World Wide Web (WWW), world is being overloaded
with huge data. This huge data carries potential information that once extracted, can be used …

Sections-based bibliographic coupling for research paper recommendation

R Habib, MT Afzal - Scientometrics, 2019 - Springer
Digital libraries suffer from the problem of information overload due to immense proliferation
of research papers in journals and conference papers. This makes it challenging for …

PDFDataExtractor: A tool for reading scientific text and interpreting metadata from the typeset literature in the portable document format

M Zhu, JM Cole - Journal of Chemical Information and Modeling, 2022 - ACS Publications
The layout of portable document format (PDF) files is constant to any screen, and the
metadata therein are latent, compared to mark-up languages such as HTML and XML. No …

The document components ontology (DoCO)

A Constantin, S Peroni, S Pettifer, D Shotton… - Semantic …, 2016 - content.iospress.com
The availability in machine-readable form of descriptions of the structure of documents, as
well as of the document discourse (eg the scientific discourse within scholarly articles), is …

A benchmark and evaluation for text extraction from PDF

H Bast, C Korzen - 2017 ACM/IEEE joint conference on digital …, 2017 - ieeexplore.ieee.org
Extracting the body text from a PDF document is an important but surprisingly difficult task.
The reason is that PDF is a layout-based format which specifies the fonts and positions of …