The string B-tree: A new data structure for string search in external memory and its applications

P Ferragina, R Grossi - Journal of the ACM (JACM), 1999 - dl.acm.org
We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a link
between some traditional external-memory and string-matching data structures. In a short …

An approach to source-code plagiarism detection and investigation using latent semantic analysis

G Cosma, M Joy - IEEE transactions on computers, 2011 - ieeexplore.ieee.org
Plagiarism is a growing problem in academia. Academics often use plagiarism detection
tools to detect similar source-code files. Once similar files are detected, the academic …

Parameterized pattern matching: Algorithms and applications

BS Baker - Journal of computer and system sciences, 1996 - Elsevier
The problem of finding sections of code that either are identical or are related by the
systematic renaming of variables or constants can be modeled in terms ofparameterized …

[HTML][HTML] Order-preserving matching

J Kim, P Eades, R Fleischer, SH Hong… - Theoretical Computer …, 2014 - Elsevier
We introduce a new string matching problem called order-preserving matching on numeric
strings, where a pattern matches a text if the text contains a substring of values whose …

On hardness of jumbled indexing

A Amir, TM Chan, M Lewenstein… - … Colloquium on Automata …, 2014 - Springer
Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a
substring of T matching a pattern represented as a Parikh vector, ie, the vector of frequency …

Finding clones with dup: Analysis of an experiment

BS Baker - IEEE Transactions on Software Engineering, 2007 - ieeexplore.ieee.org
An experiment was carried out by a group of scientists to compare different tools and
techniques for detecting duplicated or near-duplicated source code. The overall comparative …

The k-mismatch problem revisited

R Clifford, A Fontaine, E Porat, B Sach… - Proceedings of the twenty …, 2016 - SIAM
We revisit the complexity of one of the most basic problems in pattern matching. In the k-
mismatch problem we must compute the Hamming distance between a pattern of length m …

Faster algorithms for the construction of parameterized suffix trees

SR Kosaraju - Proceedings of IEEE 36th Annual Foundations of …, 1995 - ieeexplore.ieee.org
Parameterized strings were introduced by Baker to solve the problem of identifying blocks of
code that get duplicated in a large software system. Parameter symbols capture the notion of …

[HTML][HTML] Overlap matching

A Amir, R Cole, R Hariharan, M Lewenstein… - Information and …, 2003 - Elsevier
We propose a new paradigm for string matching, namely structural matching. In structural
matching, the text and pattern contents are not important. Rather, some areas in the text and …

[图书][B] 125 Problems in Text Algorithms: With Solutions

M Crochemore, T Lecroq, W Rytter - 2021 - books.google.com
String matching is one of the oldest algorithmic techniques, yet still one of the most
pervasive in computer science. The past 20 years have seen technological leaps in …