[HTML][HTML] Inducing enhanced suffix arrays for string collections
Constructing the suffix array for a string collection is an important task that may be performed
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …
Universal indexes for highly repetitive document collections
Indexing highly repetitive collections has become a relevant problem with the emergence of
large repositories of versioned documents, among other applications. These collections may …
large repositories of versioned documents, among other applications. These collections may …
Space-Efficient Frameworks for Top-k String Retrieval
The inverted index is the backbone of modern web search engines. For each word in a
collection of web documents, the index records the list of documents where this word occurs …
collection of web documents, the index records the list of documents where this word occurs …
[HTML][HTML] Document listing on repetitive collections with guaranteed performance
G Navarro - Theoretical Computer Science, 2019 - Elsevier
We consider document listing on string collections, that is, finding in which strings a given
pattern appears. In particular, we focus on repetitive collections: a collection of size N over …
pattern appears. In particular, we focus on repetitive collections: a collection of size N over …
Document retrieval on repetitive string collections
Most of the fastest-growing string collections today are repetitive, that is, most of the
constituent documents are similar to many others. As these collections keep growing, a key …
constituent documents are similar to many others. As these collections keep growing, a key …
A linear-space data structure for Range-LCP queries in poly-logarithmic time
Abstract Let T [1, n] be a text of length n and T [i, n] be the suffix starting at position i. Also, for
any two strings X and Y, let LCP (X, Y) denote their longest common prefix. The range-LCP …
any two strings X and Y, let LCP (X, Y) denote their longest common prefix. The range-LCP …
Indexes for document retrieval with relevance
Document retrieval is a special type of pattern matching that is closely related to information
retrieval and web searching. In this problem, the data consist of a collection of text …
retrieval and web searching. In this problem, the data consist of a collection of text …
MRCSI: compressing and searching string collections with multiple references
Efficiently storing and searching collections of similar strings, such as large populations of
genomes or long change histories of documents from Wikis, is a timely and challenging …
genomes or long change histories of documents from Wikis, is a timely and challenging …
Document listing on versioned documents
Representing versioned documents, such as Wikipedia history, web archives, genome
databases, backups, is challenging when we want to support searching for an exact …
databases, backups, is challenging when we want to support searching for an exact …
Hybrid indexing for versioned document search with cluster-based retrieval
The previous two-phase method for searching versioned documents seeks a cost tradeoff by
using non-positional information to rank document versions first. The second phase then re …
using non-positional information to rank document versions first. The second phase then re …