[HTML][HTML] Inducing enhanced suffix arrays for string collections

FA Louza, S Gog, GP Telles - Theoretical Computer Science, 2017 - Elsevier
Constructing the suffix array for a string collection is an important task that may be performed
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …

Universal indexes for highly repetitive document collections

F Claude, A Fariña, MA Martínez-Prieto, G Navarro - Information Systems, 2016 - Elsevier
Indexing highly repetitive collections has become a relevant problem with the emergence of
large repositories of versioned documents, among other applications. These collections may …

Space-Efficient Frameworks for Top-k String Retrieval

WK Hon, R Shah, SV Thankachan… - Journal of the ACM (JACM), 2014 - dl.acm.org
The inverted index is the backbone of modern web search engines. For each word in a
collection of web documents, the index records the list of documents where this word occurs …

[HTML][HTML] Document listing on repetitive collections with guaranteed performance

G Navarro - Theoretical Computer Science, 2019 - Elsevier
We consider document listing on string collections, that is, finding in which strings a given
pattern appears. In particular, we focus on repetitive collections: a collection of size N over …

Document retrieval on repetitive string collections

T Gagie, A Hartikainen, K Karhu, J Kärkkäinen… - Information Retrieval …, 2017 - Springer
Most of the fastest-growing string collections today are repetitive, that is, most of the
constituent documents are similar to many others. As these collections keep growing, a key …

A linear-space data structure for Range-LCP queries in poly-logarithmic time

P Abedin, A Ganguly, WK Hon, K Matsuda… - Theoretical Computer …, 2020 - Elsevier
Abstract Let T [1, n] be a text of length n and T [i, n] be the suffix starting at position i. Also, for
any two strings X and Y, let LCP (X, Y) denote their longest common prefix. The range-LCP …

Indexes for document retrieval with relevance

WK Hon, M Patil, R Shah, SV Thankachan… - … : Papers in Honor of J. Ian …, 2013 - Springer
Document retrieval is a special type of pattern matching that is closely related to information
retrieval and web searching. In this problem, the data consist of a collection of text …

MRCSI: compressing and searching string collections with multiple references

S Wandelt, U Leser - Proceedings of the VLDB Endowment, 2015 - dl.acm.org
Efficiently storing and searching collections of similar strings, such as large populations of
genomes or long change histories of documents from Wikis, is a timely and challenging …

Document listing on versioned documents

F Claude, JI Munro - String Processing and Information Retrieval: 20th …, 2013 - Springer
Representing versioned documents, such as Wikipedia history, web archives, genome
databases, backups, is challenging when we want to support searching for an exact …

Hybrid indexing for versioned document search with cluster-based retrieval

X Jin, D Agun, T Yang, Q Wu, Y Shen… - Proceedings of the 25th …, 2016 - dl.acm.org
The previous two-phase method for searching versioned documents seeks a cost tradeoff by
using non-positional information to rank document versions first. The second phase then re …