Detecting near-duplicate text documents with a hybrid approach

An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding

K Yalcin, I Cicekli, G Ercan - Expert Systems with Applications, 2022 - Elsevier

The aim of this paper is to present an automatic plagiarism detection system to identify
plagiarized passages of documents. Our plagiarism detection system uses both syntactic …

被引用次数：36 相关文章所有 4 个版本

[PDF] springer.com

Pairwise document similarity measure based on present term set

M Oghbaie, M Mohammadi Zanjireh - Journal of Big Data, 2018 - Springer

Measuring pairwise document similarity is an essential operation in various text mining
tasks. Most of the similarity measures judge the similarity between two documents based on …

被引用次数：56 相关文章所有 9 个版本

Locality sensitive blocking (LSB): A robust blocking technique for data deduplication

A Sohail, W Qounain - Journal of Information Science, 2024 - journals.sagepub.com

Data deduplication is process of discovering multiple representations of same entity in an
information system. Blocking has been a benchmark technique for avoiding the pair-wise …

被引用次数：3 相关文章

[PDF] wiley.com Full View

Effective and Fast Near Duplicate Detection via Signature‐Based Compression Metrics

X Zhang, Y Yao, Y Ji, B Fang - Mathematical Problems in …, 2016 - Wiley Online Library

Detecting near duplicates on the web is challenging due to its volume and variety. Most of
the previous studies require the setting of input parameters, making it difficult for them to …

被引用次数：12 相关文章所有 8 个版本

[PDF] tubitak.gov.tr

A fast text similarity measure for large document collections using multireference cosine and genetic algorithm

H Mohammadi, SH Khasteh - Turkish Journal of Electrical …, 2020 - journals.tubitak.gov.tr

One of the critical factors that make a search engine fast and accurate is a concise and
duplicate free index. In order to remove duplicate and near-duplicate (DND) documents from …

被引用次数：8 相关文章所有 5 个版本

[PDF] researchgate.net

[PDF][PDF] Similarity search based on text embedding model for detection of near duplicates

AR Mishra, VK Panchal, P Kumar - International Journal of Grid …, 2020 - researchgate.net

Large amount of information in the form of text data is available to us which is acquired from
various sources and stored properly for future use. There is an urgent need of finding a way …

被引用次数：10 相关文章

[PDF] ijasrm.com

[PDF][PDF] Near duplicate web page detection for efficient web crawling: a survey

SS Bhamare - International Journal of Advanced Scientific Research …, 2019 - ijasrm.com

The immense quantity of information in the World Wide Web, content mining gives lists to the
search engines for the sake of the relevance to the keywords. Web content mining is used to …

被引用次数：3 相关文章

[PDF] ieee.org

Reverse Engineering of Intel Microcode Update Structure

Z Yang, Q Li, P Zhang, Z Chen - IEEE Access, 2020 - ieeexplore.ieee.org

Microcode update mechanism have been widely used in modern processors. Due to the
implementation details are not public, researchers are prevented from gaining any sort of …

被引用次数：2 相关文章所有 3 个版本

[PDF] ndhu.edu.tw

BCDP: a blockchain-based credible data publishing system

FQ Liao, JF Wang, J Shen - Journal of Internet Technology, 2019 - jit.ndhu.edu.tw

With the advent of the era of big data, how to publish electronic data in a manner of credible
become a challenge. Traditional data publishing schemes either require a trusted third party …

被引用次数：4 相关文章

[PDF] researchgate.net

[PDF][PDF] Semantic Coupling of Scientific Literature using sBERT: An Enhanced Model for Systematic Literature Review

S Ghosh - researchgate.net

Semantic coupling refers to betweenness among documents having the same textual
context. Various AI tools are available for identifying semantically correlated texts for literary …