Spaces, trees, and colors: The algorithmic landscape of document retrieval on sequences
G Navarro - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Document retrieval is one of the best-established information retrieval activities since
the'60s, pervading all search engines. Its aim is to obtain, from a collection of text …
the'60s, pervading all search engines. Its aim is to obtain, from a collection of text …
Space-efficient construction of compressed indexes in deterministic linear time
We show that the compressed suffix array and the compressed suffix tree of a string T can be
built in O (n) deterministic time using O (n log σ) bits of space, where n is the string length …
built in O (n) deterministic time using O (n log σ) bits of space, where n is the string length …
Top-k Document Retrieval in Optimal Time and Linear Space
We describe a data structure that uses O (n)-word space and reports k most relevant
documents that contain a query pattern P in optimal O (| P|+ k) time. Our construction …
documents that contain a query pattern P in optimal O (| P|+ k) time. Our construction …
Linear time construction of compressed text indices in compact space
D Belazzougui - Proceedings of the forty-sixth Annual ACM Symposium …, 2014 - dl.acm.org
We show that the compressed suffix array and the compressed suffix tree for a string of
length n over an integer alphabet of size σ≤ n can both be built in O (n)(randomized) time …
length n over an integer alphabet of size σ≤ n can both be built in O (n)(randomized) time …
Colored range queries and document retrieval
Colored range queries are a well-studied topic in computational geometry and database
research that, in the past decade, have found exciting applications in information retrieval. In …
research that, in the past decade, have found exciting applications in information retrieval. In …
Linear-time string indexing and analysis in small space
The field of succinct data structures has flourished over the past 16 years. Starting from the
compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina …
compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina …
Space-Efficient Frameworks for Top-k String Retrieval
The inverted index is the backbone of modern web search engines. For each word in a
collection of web documents, the index records the list of documents where this word occurs …
collection of web documents, the index records the list of documents where this word occurs …
A scalable approach for index compression using wavelet tree and LZW
Nowadays, people are using multimedia tools that store multiple types of data in a large
amount. The information retrieval from such a vast amount is not an easy task. So, document …
amount. The information retrieval from such a vast amount is not an easy task. So, document …
Time-Optimal Top- Document Retrieval
Let \mathcalD be a collection of D documents, which are strings over an alphabet of size σ,
of total length n. We describe a data structure that uses linear space and reports k most …
of total length n. We describe a data structure that uses linear space and reports k most …
Learned monotone minimal perfect hashing
A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a
function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary …
function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary …