Graph transformer for cross-lingual plagiarism detection

O Hourrane - IAES International Journal of Artificial …, 2022 - search.proquest.com
IAES International Journal of Artificial Intelligence, 2022search.proquest.com
The existence of vast amounts of multilingual textual data on the internet leads to cross-
lingual plagiarism which becomes a serious issue in different fields such as education,
science, and literature. Current cross-lingual plagiarism detection approaches usually
employ syntactic and lexical properties, external machine translation systems, or finding
similarities within a multilingual set of text documents. However, most of these methods are
conceived for literal plagiarism such as copy and paste, and their performance is diminished …
Abstract
The existence of vast amounts of multilingual textual data on the internet leads to cross-lingual plagiarism which becomes a serious issue in different fields such as education, science, and literature. Current cross-lingual plagiarism detection approaches usually employ syntactic and lexical properties, external machine translation systems, or finding similarities within a multilingual set of text documents. However, most of these methods are conceived for literal plagiarism such as copy and paste, and their performance is diminished when handling complex cases of plagiarism including paraphrasing. In this paper, we propose a new graph-based approach that represents text passages in different languages using knowledge graphs. We put forward a new graph structure modeling method based on the Transformer architecture that employs precise relation encoding and delivers a more efficient way for global graph representation. The mappings between the graphs are learned both in semi-supervised and unsupervised training mechanisms. The results of our experiments in Arabic–English, French–English, and Spanish–English plagiarism detection show that our graph transformer method surpasses the state-of-the-art cross-lingual plagiarism detection approaches with and without paraphrasing cases, and provides further insights on the use of knowledge graphs on a language-independent model.
ProQuest
以上显示的是最相近的搜索结果。 查看全部搜索结果