Hypertext categorization using hyperlink patterns and meta data

X Qi, BD Davison - ACM computing surveys (CSUR), 2009 - dl.acm.org

Classification of Web page content is essential to many tasks in Web information retrieval
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …

被引用次数：760 相关文章所有 8 个版本

[图书][B] The text mining handbook: advanced approaches in analyzing unstructured data

R Feldman, J Sanger - 2007 - books.google.com

Text mining is a new and exciting area of computer science research that tries to solve the
crisis of information overload by combining techniques from data mining, machine learning …

被引用次数：4561 相关文章所有 5 个版本

[PDF] neurips.cc

Link prediction in relational data

B Taskar, MF Wong, P Abbeel… - Advances in neural …, 2003 - proceedings.neurips.cc

Many real-world domains are relational in nature, consisting of a set of objects related to
each other in complex ways. This paper focuses on predicting the existence and the type of …

被引用次数：667 相关文章所有 26 个版本

[PDF] psu.edu

A study of thresholding strategies for text categorization

Y Yang - Proceedings of the 24th annual international ACM …, 2001 - dl.acm.org

Thresholding strategies in automated text categorization are an underexplored area of
research. This paper presents an examination of the effect of thresholding strategies on the …

被引用次数：501 相关文章所有 13 个版本

[PDF] jmlr.org

[PDF][PDF] Distributional word clusters vs. words for text categorization

R Bekkerman, R El-Yaniv, N Tishby, Y Winter - Journal of Machine …, 2003 - jmlr.org

We study an approach to text categorization that combines distributional clustering of words
and a Support Vector Machine (SVM) classifier. This word-cluster representation is …

被引用次数：467 相关文章所有 31 个版本

[PDF] academia.edu

A study of approaches to hypertext categorization

Y Yang, S Slattery, R Ghani - Journal of Intelligent Information Systems, 2002 - Springer

Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags,
category labels distributed over linked documents, and meta data extracted from related …

被引用次数：470 相关文章所有 10 个版本

[PDF] jmlr.org

[PDF][PDF] Learning probabilistic models of link structure

L Getoor, N Friedman, D Koller, B Taskar - Journal of Machine Learning …, 2002 - jmlr.org

Most real-world data is heterogeneous and richly interconnected. Examples include the
Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning …

被引用次数：380 相关文章所有 32 个版本

[PDF] semanticscholar.org

Assam: A tool for semi-automatically annotating semantic web services

A Heß, E Johnston, N Kushmerick - The Semantic Web–ISWC 2004: Third …, 2004 - Springer

Abstract The semantic Web Services vision requires that each service be annotated with
semantic metadata. Manually creating such metadata is tedious and error-prone, and many …

被引用次数：279 相关文章所有 21 个版本

[PDF] uva.nl

Discovering missing links in Wikipedia

SF Adafre, M de Rijke - Proceedings of the 3rd international workshop …, 2005 - dl.acm.org

In this paper we address the problem of discovering missing hypertext links in Wikipedia.
The method we propose consists of two steps: first, we compute a cluster of highly similar …

被引用次数：247 相关文章所有 13 个版本

[PDF] diva-portal.org

Automated subject classification of textual web documents

K Golub - Journal of documentation, 2006 - emerald.com

Purpose–To provide an integrated perspective to similarities and differences between
approaches to automated classification in different research communities (machine learning …

被引用次数：74 相关文章所有 12 个版本