The string B-tree: A new data structure for string search in external memory and its applications
P Ferragina, R Grossi - Journal of the ACM (JACM), 1999 - dl.acm.org
We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a link
between some traditional external-memory and string-matching data structures. In a short …
between some traditional external-memory and string-matching data structures. In a short …
An approach to source-code plagiarism detection and investigation using latent semantic analysis
Plagiarism is a growing problem in academia. Academics often use plagiarism detection
tools to detect similar source-code files. Once similar files are detected, the academic …
tools to detect similar source-code files. Once similar files are detected, the academic …
Parameterized pattern matching: Algorithms and applications
BS Baker - Journal of computer and system sciences, 1996 - Elsevier
The problem of finding sections of code that either are identical or are related by the
systematic renaming of variables or constants can be modeled in terms ofparameterized …
systematic renaming of variables or constants can be modeled in terms ofparameterized …
[HTML][HTML] Order-preserving matching
We introduce a new string matching problem called order-preserving matching on numeric
strings, where a pattern matches a text if the text contains a substring of values whose …
strings, where a pattern matches a text if the text contains a substring of values whose …
On hardness of jumbled indexing
Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a
substring of T matching a pattern represented as a Parikh vector, ie, the vector of frequency …
substring of T matching a pattern represented as a Parikh vector, ie, the vector of frequency …
Finding clones with dup: Analysis of an experiment
BS Baker - IEEE Transactions on Software Engineering, 2007 - ieeexplore.ieee.org
An experiment was carried out by a group of scientists to compare different tools and
techniques for detecting duplicated or near-duplicated source code. The overall comparative …
techniques for detecting duplicated or near-duplicated source code. The overall comparative …
The k-mismatch problem revisited
We revisit the complexity of one of the most basic problems in pattern matching. In the k-
mismatch problem we must compute the Hamming distance between a pattern of length m …
mismatch problem we must compute the Hamming distance between a pattern of length m …
Faster algorithms for the construction of parameterized suffix trees
SR Kosaraju - Proceedings of IEEE 36th Annual Foundations of …, 1995 - ieeexplore.ieee.org
Parameterized strings were introduced by Baker to solve the problem of identifying blocks of
code that get duplicated in a large software system. Parameter symbols capture the notion of …
code that get duplicated in a large software system. Parameter symbols capture the notion of …
[HTML][HTML] Overlap matching
We propose a new paradigm for string matching, namely structural matching. In structural
matching, the text and pattern contents are not important. Rather, some areas in the text and …
matching, the text and pattern contents are not important. Rather, some areas in the text and …
[图书][B] 125 Problems in Text Algorithms: With Solutions
String matching is one of the oldest algorithmic techniques, yet still one of the most
pervasive in computer science. The past 20 years have seen technological leaps in …
pervasive in computer science. The past 20 years have seen technological leaps in …