An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding

K Yalcin, I Cicekli, G Ercan - Expert Systems with Applications, 2022 - Elsevier
The aim of this paper is to present an automatic plagiarism detection system to identify
plagiarized passages of documents. Our plagiarism detection system uses both syntactic …

Pairwise document similarity measure based on present term set

M Oghbaie, M Mohammadi Zanjireh - Journal of Big Data, 2018 - Springer
Measuring pairwise document similarity is an essential operation in various text mining
tasks. Most of the similarity measures judge the similarity between two documents based on …

Locality sensitive blocking (LSB): A robust blocking technique for data deduplication

A Sohail, W Qounain - Journal of Information Science, 2024 - journals.sagepub.com
Data deduplication is process of discovering multiple representations of same entity in an
information system. Blocking has been a benchmark technique for avoiding the pair-wise …

Effective and Fast Near Duplicate Detection via Signature‐Based Compression Metrics

X Zhang, Y Yao, Y Ji, B Fang - Mathematical Problems in …, 2016 - Wiley Online Library
Detecting near duplicates on the web is challenging due to its volume and variety. Most of
the previous studies require the setting of input parameters, making it difficult for them to …

A fast text similarity measure for large document collections using multireference cosine and genetic algorithm

H Mohammadi, SH Khasteh - Turkish Journal of Electrical …, 2020 - journals.tubitak.gov.tr
One of the critical factors that make a search engine fast and accurate is a concise and
duplicate free index. In order to remove duplicate and near-duplicate (DND) documents from …

[PDF][PDF] Similarity search based on text embedding model for detection of near duplicates

AR Mishra, VK Panchal, P Kumar - International Journal of Grid …, 2020 - researchgate.net
Large amount of information in the form of text data is available to us which is acquired from
various sources and stored properly for future use. There is an urgent need of finding a way …

[PDF][PDF] Near duplicate web page detection for efficient web crawling: a survey

SS Bhamare - International Journal of Advanced Scientific Research …, 2019 - ijasrm.com
The immense quantity of information in the World Wide Web, content mining gives lists to the
search engines for the sake of the relevance to the keywords. Web content mining is used to …

Reverse Engineering of Intel Microcode Update Structure

Z Yang, Q Li, P Zhang, Z Chen - IEEE Access, 2020 - ieeexplore.ieee.org
Microcode update mechanism have been widely used in modern processors. Due to the
implementation details are not public, researchers are prevented from gaining any sort of …

BCDP: a blockchain-based credible data publishing system

FQ Liao, JF Wang, J Shen - Journal of Internet Technology, 2019 - jit.ndhu.edu.tw
With the advent of the era of big data, how to publish electronic data in a manner of credible
become a challenge. Traditional data publishing schemes either require a trusted third party …

[PDF][PDF] Semantic Coupling of Scientific Literature using sBERT: An Enhanced Model for Systematic Literature Review

S Ghosh - researchgate.net
Semantic coupling refers to betweenness among documents having the same textual
context. Various AI tools are available for identifying semantically correlated texts for literary …