Deeper inside pagerank

AN Langville, CD Meyer - Internet Mathematics, 2004 - Taylor & Francis
This paper serves as a companion or extension to the" Inside PageRank" paper by Bianchini
et al.[Bianchini et al. 03]. It is a comprehensive survey of all issues associated with …

Eliminating noisy information in web pages for data mining

L Yi, B Liu, X Li - Proceedings of the ninth ACM SIGKDD international …, 2003 - dl.acm.org
A commercial Web page typically contains many information blocks. Apart from the main
content blocks, it usually has such blocks as navigation panels, copyright and privacy …

The volume and evolution of web page templates

D Gibson, K Punera, A Tomkins - Special interest tracks and posters of …, 2005 - dl.acm.org
Web pages contain a combination of unique content and template material, which is present
across multiple pages and used primarily for formatting, navigation, and branding. We study …

Block-level link analysis

D Cai, X He, JR Wen, WY Ma - … of the 27th annual international ACM …, 2004 - dl.acm.org
Link Analysis has shown great potential in improving the performance of web search.
PageRank and HITS are two of the most popular algorithms. Most of the existing link …

Document clustering of scientific texts using citation contexts

B Aljaber, N Stokes, J Bailey, J Pei - Information Retrieval, 2010 - Springer
Document clustering has many important applications in the area of data mining and
information retrieval. Many existing document clustering techniques use the “bag-of-words” …

Page-level template detection via isotonic smoothing

D Chakrabarti, R Kumar, K Punera - Proceedings of the 16th …, 2007 - dl.acm.org
We develop a novel framework for the page-level template detection problem. Our
framework is built on two main ideas. The first is theautomatic generation of training data for …

[PDF][PDF] Web page cleaning for web mining through feature weighting

L Yi, B Liu - IJCAI, 2003 - Citeseer
Unlike conventional data or text, Web pages typically contain a large amount of information
that is not part of the main contents of the pages, eg, banner ads, navigation bars, and …

Mining web informative structures and contents based on entropy analysis

HY Kao, SH Lin, JM Ho… - IEEE Transactions on …, 2004 - ieeexplore.ieee.org
We study the problem of mining the informative structure of a news Web site that consists of
thousands of hyperlinked documents. We define the informative structure of a news Web site …

WISDOM: Web intrapage informative structure mining based on document object model

HY Kao, JM Ho, MS Chen - IEEE Transactions on Knowledge …, 2005 - ieeexplore.ieee.org
To increase the commercial value and accessibility of pages, most content sites tend to
publish their pages with intrasite redundant information, such as navigation panels …

Unsupervised named entity recognition and disambiguation: An application to old french journals

Y Mosallam, A Abi-Haidar, JG Ganascia - Industrial Conference on Data …, 2014 - Springer
In this paper we introduce our method of Unsupervised Named Entity Recognition and
Disambiguation (UNERD) that we test on a recently digitized unlabeled corpus of French …