[HTML][HTML] Efficient construction of a complete index for pan-genomics read alignment
Short-read aligners predominantly use the FM-index, which is easily able to index one or a
few human genomes. However, it does not scale well to indexing collections of thousands of …
few human genomes. However, it does not scale well to indexing collections of thousands of …
Towards pan-genome read alignment to improve variation calling
Background Typical human genome differs from the reference genome at 4-5 million sites.
This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting …
This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting …
Sublinear time Lempel-Ziv (LZ77) factorization
J Ellert - International Symposium on String Processing and …, 2023 - Springer
Abstract The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that
plays a central role in data compression and indexing. For a length-n string over integer …
plays a central role in data compression and indexing. For a length-n string over integer …
[HTML][HTML] r-indexing the eBWT
Abstract The extended Burrows-Wheeler Transform (eBWT) was introduced by Mantaci et
al.[TCS 2007] to extend the definition of the BWT to a collection of strings. As opposed to …
al.[TCS 2007] to extend the definition of the BWT to a collection of strings. As opposed to …
Founder reconstruction enables scalable and seamless pangenomic analysis
Motivation Variant calling workflows that utilize a single reference sequence are the de facto
standard elementary genomic analysis routine for resequencing projects. Various ways to …
standard elementary genomic analysis routine for resequencing projects. Various ways to …
Hybrid indexing revisited
H Ferrada, D Kempa, SJ Puglisi - … Proceedings of the Twentieth Workshop on …, 2018 - SIAM
Hybrid indexing is a recent approach to text indexing that allows the space-usage of
conventional text indexes (eg, suffix trees, suffix arrays, FM-indexes) to scale well with the …
conventional text indexes (eg, suffix trees, suffix arrays, FM-indexes) to scale well with the …
Efficient construction of a complete index for pan-genomics read alignment
While short read aligners, which predominantly use the FM-index, are able to easily index
one or a few human genomes, they do not scale well to indexing databases containing …
one or a few human genomes, they do not scale well to indexing databases containing …
Lempel–Ziv-like parsing in small space
Abstract Lempel–Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used
compressors for repetitive texts. However, the existing efficient methods computing the exact …
compressors for repetitive texts. However, the existing efficient methods computing the exact …
Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment
Computational pan-genomics utilizes information from multiple individual genomes in large-
scale comparative analysis. Genetic variation between case-controls, ethnic groups, or …
scale comparative analysis. Genetic variation between case-controls, ethnic groups, or …
CHIC: a short read aligner for pan-genomic references
D Valenzuela, V Mäkinen - biorxiv, 2017 - biorxiv.org
Recently the topic of computational pan-genomics has gained increasing attention, and
particularly the problem of moving from a single-reference paradigm to a pan-genomic one …
particularly the problem of moving from a single-reference paradigm to a pan-genomic one …