查看文章

academia.edu 中的 [PDF]

Using the structure of web sites for automatic segmentation of tables

作者

Kristina Lerman, Lise Getoor, Steven Minton, Craig Knoblock

发表日期

2004/6/13

图书

Proceedings of the 2004 ACM SIGMOD international conference on Management of data

页码范围

119-130

简介

Many Web sites, especially those that dynamically generate HTML pages to display the results of a user's query, present information in the form of list or tables. Current tools that allow applications to programmatically extract this information rely heavily on user input, often in the form of labeled extracted records. The sheer size and rate of growth of the Web make any solution that relies primarily on user input is infeasible in the long term. Fortunately, many Web sites contain much explicit and implicit structure, both in layout and content, that we can exploit for the purpose of information extraction. This paper describes an approach to automatic extraction and segmentation of records from Web tables. Automatic methods do not require any user input, but rely solely on the layout and content of the Web source. Our approach relies on the common structure of many Web sites, which present information as a list or a table …

引用总数

被引用次数：216

200420052006200720082009201020112012201320142015201620172018201920202021202220234 17 26 22 18 24 18 21 15 9 8 4 8 3 3 2 5 4 3 2

学术搜索中的文章

Using the structure of web sites for automatic segmentation of tables

K Lerman, L Getoor, S Minton, C Knoblock - Proceedings of the 2004 ACM SIGMOD international …, 2004

被引用次数：216 相关文章所有 13 个版本