Spaces, trees, and colors: The algorithmic landscape of document retrieval on sequences

G Navarro - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Document retrieval is one of the best-established information retrieval activities since
the'60s, pervading all search engines. Its aim is to obtain, from a collection of text …

Space-efficient construction of compressed indexes in deterministic linear time

JI Munro, G Navarro, Y Nekrich - Proceedings of the Twenty-Eighth Annual …, 2017 - SIAM
We show that the compressed suffix array and the compressed suffix tree of a string T can be
built in O (n) deterministic time using O (n log σ) bits of space, where n is the string length …

Top-k Document Retrieval in Optimal Time and Linear Space

G Navarro, Y Nekrich - Proceedings of the twenty-third annual ACM-SIAM …, 2012 - SIAM
We describe a data structure that uses O (n)-word space and reports k most relevant
documents that contain a query pattern P in optimal O (| P|+ k) time. Our construction …

Linear time construction of compressed text indices in compact space

D Belazzougui - Proceedings of the forty-sixth Annual ACM Symposium …, 2014 - dl.acm.org
We show that the compressed suffix array and the compressed suffix tree for a string of
length n over an integer alphabet of size σ≤ n can both be built in O (n)(randomized) time …

Colored range queries and document retrieval

T Gagie, J Kärkkäinen, G Navarro, SJ Puglisi - Theoretical Computer …, 2013 - Elsevier
Colored range queries are a well-studied topic in computational geometry and database
research that, in the past decade, have found exciting applications in information retrieval. In …

Linear-time string indexing and analysis in small space

D Belazzougui, F Cunial, J Kärkkäinen… - ACM Transactions on …, 2020 - dl.acm.org
The field of succinct data structures has flourished over the past 16 years. Starting from the
compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina …

Space-Efficient Frameworks for Top-k String Retrieval

WK Hon, R Shah, SV Thankachan… - Journal of the ACM (JACM), 2014 - dl.acm.org
The inverted index is the backbone of modern web search engines. For each word in a
collection of web documents, the index records the list of documents where this word occurs …

A scalable approach for index compression using wavelet tree and LZW

S Gupta, AK Yadav, D Yadav, B Shukla - International Journal of …, 2022 - Springer
Nowadays, people are using multimedia tools that store multiple types of data in a large
amount. The information retrieval from such a vast amount is not an easy task. So, document …

Time-Optimal Top- Document Retrieval

G Navarro, Y Nekrich - SIAM Journal on Computing, 2017 - SIAM
Let \mathcalD be a collection of D documents, which are strings over an alphabet of size σ,
of total length n. We describe a data structure that uses linear space and reports k most …

Learned monotone minimal perfect hashing

P Ferragina, HP Lehmann, P Sanders… - arXiv preprint arXiv …, 2023 - arxiv.org
A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a
function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary …