Document listing on repetitive collections
Many document collections consist largely of repeated material, and several indexes have
been designed to take advantage of this. There has been only preliminary work, however,
on document retrieval for repetitive collections. In this paper we show how one of those
indexes, the run-length compressed suffix array (RLCSA), can be extended to support
document listing. In our experiments, our additional structures on top of the RLCSA can
reduce the query time for document listing by an order of magnitude while still using total …
been designed to take advantage of this. There has been only preliminary work, however,
on document retrieval for repetitive collections. In this paper we show how one of those
indexes, the run-length compressed suffix array (RLCSA), can be extended to support
document listing. In our experiments, our additional structures on top of the RLCSA can
reduce the query time for document listing by an order of magnitude while still using total …
[HTML][HTML] Document listing on repetitive collections with guaranteed performance
G Navarro - Theoretical Computer Science, 2019 - Elsevier
We consider document listing on string collections, that is, finding in which strings a given
pattern appears. In particular, we focus on repetitive collections: a collection of size N over
alphabet [1, σ] is composed of D copies of a string of size n, and s edits are applied on
ranges of copies. We introduce the first document listing index with size O˜(n+ s), precisely O
((n lg σ+ s lg 2 N) lg D) bits, and with useful worst-case time guarantees: Given a
pattern of length m, the index reports the ndoc> 0 strings where it appears in time O (m lg 1+ …
pattern appears. In particular, we focus on repetitive collections: a collection of size N over
alphabet [1, σ] is composed of D copies of a string of size n, and s edits are applied on
ranges of copies. We introduce the first document listing index with size O˜(n+ s), precisely O
((n lg σ+ s lg 2 N) lg D) bits, and with useful worst-case time guarantees: Given a
pattern of length m, the index reports the ndoc> 0 strings where it appears in time O (m lg 1+ …
以上显示的是最相近的搜索结果。 查看全部搜索结果