查看文章

oup.com 中的 [HTML]

Web scraping technologies in an API world

作者

Daniel Glez-Peña, Anália Lourenço, Hugo López-Fernández, Miguel Reboiro-Jato, Florentino Fdez-Riverola

发表日期

2014/9/1

来源

Briefings in bioinformatics

卷号

期号

页码范围

788-797

出版商

Oxford University Press

简介

Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification …

引用总数

被引用次数：285

201420152016201720182019202020212022202320241 7 7 15 21 28 36 47 35 57 24

学术搜索中的文章

Web scraping technologies in an API world

D Glez-Peña, A Lourenço, H López-Fernández… - Briefings in bioinformatics, 2014

被引用次数：285 相关文章所有 6 个版本