Web page classification: Features and algorithms
X Qi, BD Davison - ACM computing surveys (CSUR), 2009 - dl.acm.org
Classification of Web page content is essential to many tasks in Web information retrieval
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …
[图书][B] Web data mining: exploring hyperlinks, contents, and usage data
B Liu - 2011 - Springer
Liu has written a comprehensive text on Web mining, which consists of two parts. The first
part covers the data mining and machine learning foundations, where all the essential …
part covers the data mining and machine learning foundations, where all the essential …
Opinion spam and analysis
N Jindal, B Liu - Proceedings of the 2008 international conference on …, 2008 - dl.acm.org
Evaluative texts on the Web have become a valuable source of opinions on products,
services, events, individuals, etc. Recently, many researchers have studied such opinion …
services, events, individuals, etc. Recently, many researchers have studied such opinion …
Data-Centric Systems and Applications
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …
accessible data source in the world. Web mining aims to discover useful information or …
Detecting spam web pages through content analysis
In this paper, we continue our investigations of" web spam": the injection of artificially-
created pages into the web in order to influence the results from search engines, to drive …
created pages into the web in order to influence the results from search engines, to drive …
Survey on web spam detection: principles and algorithms
Search engines became a de facto place to start information acquisition on the Web. Though
due to web spam phenomenon, search results are not always as good as desired …
due to web spam phenomenon, search results are not always as good as desired …
Information retrieval system evaluation: effort, sensitivity, and reliability
M Sanderson, J Zobel - Proceedings of the 28th annual international …, 2005 - dl.acm.org
The effectiveness of information retrieval systems is measured by comparing performance
on a common set of queries and documents. Significance tests are often used to evaluate …
on a common set of queries and documents. Significance tests are often used to evaluate …
Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages
The increasing importance of search engines to commercial web sites has given rise to a
phenomenon we call" web spam", that is, web pages that exist only to mislead search …
phenomenon we call" web spam", that is, web pages that exist only to mislead search …
Visual complexity and aesthetic perception of web pages
The visual appearance of a Web page influences the way a user will interact with the page.
Web page structural elements (such as text, tables, links, and images) and their …
Web page structural elements (such as text, tables, links, and images) and their …
[PDF][PDF] Blocking Blog Spam with Language Model Disagreement.
We present an approach for detecting link spam common in blog comments by comparing
the language models used in the blog post, the comment, and pages linked by the …
the language models used in the blog post, the comment, and pages linked by the …