Shallow Text Analysis and Machine Learning for Authorship Attribtion.- 学术资源搜索

[PDF][PDF] Shallow Text Analysis and Machine Learning for Authorship Attribtion.

K Luyckx, W Daelemans - CLIN, 2004 - cnts.ua.ac.be

CLIN, 2004•cnts.ua.ac.be

Abstract

Current advances in shallow parsing and machine learning allow us to use results from these fields in a methodology for Authorship Attribution. We report on experiments with a corpus that consists of newspaper articles about national current affairs by different journalists from the Belgian newspaper De Standaard. Because the documents are in a similar genre, register, and range of topics, token-based (eg, sentence length) and lexical features (eg, vocabulary richness) can be kept roughly constant over the different authors. This allows us to focus on the use of syntax-based features as possible predictors for an author’s style, as well as on those token-based features that are predictive to author style more than to topic or register. These style characteristics are not under the author’s conscious control and therefore good clues for Authorship Attribution. Machine Learning methods (TiMBL and the WEKA software package) are used to select informative combinations of syntactic, token-based and lexical features and to predict authorship of unseen documents. The combination of these features can be considered an implicit profile that characterizes the style of an author.

cnts.ua.ac.be

展开收起

被引用次数：77 相关文章所有 18 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果