[PDF][PDF] Intrinsic plagiarism detection using character trigram distance scores
In this paper, we describe a novel approach to intrinsic plagiarism detection. Each
suspicious document is divided into a series of consecutive, potentially overlapping
'windows' of equal size. These are represented by vectors containing the relative
frequencies of a predetermined set of high-frequency character trigrams. Subsequently, a
distance matrix is set up in which each of the document's windows is compared to each
other window. The distance measure used is a symmetric adaptation of the normalized …
suspicious document is divided into a series of consecutive, potentially overlapping
'windows' of equal size. These are represented by vectors containing the relative
frequencies of a predetermined set of high-frequency character trigrams. Subsequently, a
distance matrix is set up in which each of the document's windows is compared to each
other window. The distance measure used is a symmetric adaptation of the normalized …
[引用][C] Intrinsic Plagiarism Detection Using Character Trigram Distance Scores-Notebook for PAN at CLEF 2011.
以上显示的是最相近的搜索结果。 查看全部搜索结果