Prospects and limitations of full-text index structures in genome analysis
M Vyverman, B De Baets, V Fack… - Nucleic acids …, 2012 - academic.oup.com
The combination of incessant advances in sequencing technology producing large amounts
of data and innovative bioinformatics approaches, designed to cope with this data flood, has …
of data and innovative bioinformatics approaches, designed to cope with this data flood, has …
Fully compressed suffix trees
Suffix trees are by far the most important data structure in stringology, with a myriad of
applications in fields like bioinformatics and information retrieval. Classical representations …
applications in fields like bioinformatics and information retrieval. Classical representations …
Cst++
Let A be an array of n elements taken from a totally ordered set. We present a data structure
of size 3 n+ o (n) bits that allows us to answer the following queries on A in constant time …
of size 3 n+ o (n) bits that allows us to answer the following queries on A in constant time …
Computing matching statistics and maximal exact matches on compressed full-text indexes
E Ohlebusch, S Gog, A Kügel - … , SPIRE 2010, Los Cabos, Mexico, October …, 2010 - Springer
Exact string matching is a problem that computer programmers face on a regular basis, and
full-text indexes like the suffix tree or the suffix array provide fast string search over large …
full-text indexes like the suffix tree or the suffix array provide fast string search over large …
Inverted indexes for phrases and strings
Inverted indexes are the most fundamental and widely used data structures in information
retrieval. For each unique word occurring in a document collection, the inverted index stores …
retrieval. For each unique word occurring in a document collection, the inverted index stores …
Bidirectional search in a string with wavelet trees and bidirectional matching statistics
T Schnattinger, E Ohlebusch, S Gog - Information and Computation, 2012 - Elsevier
Searching for genes encoding microRNAs (miRNAs) is an important task in genome
analysis. Because the secondary structure of miRNA (but not the sequence) is highly …
analysis. Because the secondary structure of miRNA (but not the sequence) is highly …
Combined data structure for previous-and next-smaller-values
J Fischer - Theoretical Computer Science, 2011 - Elsevier
Let A be a static array storing n elements from a totally ordered set. We present a data
structure of optimal size at most nlog2 (3+ 22)+ o (n) bits that allows us to answer the …
structure of optimal size at most nlog2 (3+ 22)+ o (n) bits that allows us to answer the …
Compressed suffix trees: design, construction, and applications
S Gog - 2011 - oparu.uni-ulm.de
In sequence analysis it is often advantageous to build an index data structure for large texts,
as many tasks--for instance repeated pattern matching--can then be solved in optimal time …
as many tasks--for instance repeated pattern matching--can then be solved in optimal time …
Compressed suffix trees: Efficient computation and storage of LCP-values
S Gog, E Ohlebusch - Journal of Experimental Algorithmics (JEA), 2013 - dl.acm.org
The suffix tree is a very important data structure in string processing, but typical
implementations suffer from huge space consumption. In large-scale applications …
implementations suffer from huge space consumption. In large-scale applications …
Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays
TD Wu - Algorithms for Molecular Biology, 2016 - Springer
Background Suffix arrays and their variants are used widely for representing genomes in
search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require …
search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require …