Web page classification: Features and algorithms
X Qi, BD Davison - ACM computing surveys (CSUR), 2009 - dl.acm.org
Classification of Web page content is essential to many tasks in Web information retrieval
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …
[图书][B] The text mining handbook: advanced approaches in analyzing unstructured data
Text mining is a new and exciting area of computer science research that tries to solve the
crisis of information overload by combining techniques from data mining, machine learning …
crisis of information overload by combining techniques from data mining, machine learning …
Link prediction in relational data
Many real-world domains are relational in nature, consisting of a set of objects related to
each other in complex ways. This paper focuses on predicting the existence and the type of …
each other in complex ways. This paper focuses on predicting the existence and the type of …
A study of thresholding strategies for text categorization
Y Yang - Proceedings of the 24th annual international ACM …, 2001 - dl.acm.org
Thresholding strategies in automated text categorization are an underexplored area of
research. This paper presents an examination of the effect of thresholding strategies on the …
research. This paper presents an examination of the effect of thresholding strategies on the …
[PDF][PDF] Distributional word clusters vs. words for text categorization
We study an approach to text categorization that combines distributional clustering of words
and a Support Vector Machine (SVM) classifier. This word-cluster representation is …
and a Support Vector Machine (SVM) classifier. This word-cluster representation is …
A study of approaches to hypertext categorization
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags,
category labels distributed over linked documents, and meta data extracted from related …
category labels distributed over linked documents, and meta data extracted from related …
[PDF][PDF] Learning probabilistic models of link structure
Most real-world data is heterogeneous and richly interconnected. Examples include the
Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning …
Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning …
Assam: A tool for semi-automatically annotating semantic web services
A Heß, E Johnston, N Kushmerick - The Semantic Web–ISWC 2004: Third …, 2004 - Springer
Abstract The semantic Web Services vision requires that each service be annotated with
semantic metadata. Manually creating such metadata is tedious and error-prone, and many …
semantic metadata. Manually creating such metadata is tedious and error-prone, and many …
Discovering missing links in Wikipedia
SF Adafre, M de Rijke - Proceedings of the 3rd international workshop …, 2005 - dl.acm.org
In this paper we address the problem of discovering missing hypertext links in Wikipedia.
The method we propose consists of two steps: first, we compute a cluster of highly similar …
The method we propose consists of two steps: first, we compute a cluster of highly similar …
Automated subject classification of textual web documents
K Golub - Journal of documentation, 2006 - emerald.com
Purpose–To provide an integrated perspective to similarities and differences between
approaches to automated classification in different research communities (machine learning …
approaches to automated classification in different research communities (machine learning …