Computational graph pangenomics: a tutorial on data structures and their applications

JA Baaijens, P Bonizzoni, C Boucher… - Natural Computing, 2022 - Springer
Computational pangenomics is an emerging research field that is changing the way
computer scientists are facing challenges in biological sequence analysis. In past decades …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

[HTML][HTML] Refining the r-index

H Bannai, T Gagie, I Tomohiro - Theoretical Computer Science, 2020 - Elsevier
Abstract Gagie, Navarro and Prezza's r-index (SODA, 2018) promises to speed up DNA
alignment and variation calling by allowing us to index entire genomic databases, provided …

A comparison of index-based Lempel-Ziv LZ77 factorization algorithms

A Al-Hafeedh, M Crochemore, L Ilie… - ACM Computing …, 2012 - dl.acm.org
Since 1977, when Lempel and Ziv described a kind of string factorization useful for text
compression, there has been a succession of algorithms proposed for computing “LZ …

Optimal-time queries on BWT-runs compressed indexes

T Nishimoto, Y Tabei - arXiv preprint arXiv:2006.05104, 2020 - arxiv.org
Indexing highly repetitive strings (ie, strings with many repetitions) for fast queries has
become a central research topic in string processing, because it has a wide variety of …

Linear time Lempel-Ziv factorization: Simple, fast, small

J Kärkkäinen, D Kempa, SJ Puglisi - … Bad Herrenalb, Germany, June 17-19 …, 2013 - Springer
Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in
many diverse applications, including data compression, text indexing, and pattern discovery …

[HTML][HTML] Inducing enhanced suffix arrays for string collections

FA Louza, S Gog, GP Telles - Theoretical Computer Science, 2017 - Elsevier
Constructing the suffix array for a string collection is an important task that may be performed
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …

Inducing suffix and LCP arrays in external memory

T Bingmann, J Fischer, V Osipov - Journal of Experimental Algorithmics …, 2016 - dl.acm.org
We consider full text index construction in external memory (EM). Our first contribution is an
inducing algorithm for suffix arrays in external memory, which runs in sorting complexity …

Inducing the LCP-array

J Fischer - Workshop on Algorithms and Data Structures, 2011 - Springer
We show how to modify the linear-time construction algorithm for suffix arrays based on
induced sorting (Nong et al., DCC'09) such that it computes the array of longest common …

Weighted ancestors in suffix trees revisited

D Belazzougui, D Kosolobov, SJ Puglisi… - arXiv preprint arXiv …, 2021 - arxiv.org
The weighted ancestor problem is a well-known generalization of the predecessor problem
to trees. It is known to require $\Omega (\log\log n) $ time for queries provided $ O (n\mathop …