[HTML][HTML] Efficient construction of a complete index for pan-genomics read alignment

A Kuhnle, T Mun, C Boucher, T Gagie… - Journal of …, 2020 - liebertpub.com
Short-read aligners predominantly use the FM-index, which is easily able to index one or a
few human genomes. However, it does not scale well to indexing collections of thousands of …

Towards pan-genome read alignment to improve variation calling

D Valenzuela, T Norri, N Välimäki, E Pitkänen… - BMC genomics, 2018 - Springer
Background Typical human genome differs from the reference genome at 4-5 million sites.
This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting …

Sublinear time Lempel-Ziv (LZ77) factorization

J Ellert - International Symposium on String Processing and …, 2023 - Springer
Abstract The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that
plays a central role in data compression and indexing. For a length-n string over integer …

[HTML][HTML] r-indexing the eBWT

C Boucher, D Cenzato, Z Lipták, M Rossi… - Information and …, 2024 - Elsevier
Abstract The extended Burrows-Wheeler Transform (eBWT) was introduced by Mantaci et
al.[TCS 2007] to extend the definition of the BWT to a collection of strings. As opposed to …

Founder reconstruction enables scalable and seamless pangenomic analysis

T Norri, B Cazaux, S Dönges, D Valenzuela… - …, 2021 - academic.oup.com
Motivation Variant calling workflows that utilize a single reference sequence are the de facto
standard elementary genomic analysis routine for resequencing projects. Various ways to …

Hybrid indexing revisited

H Ferrada, D Kempa, SJ Puglisi - … Proceedings of the Twentieth Workshop on …, 2018 - SIAM
Hybrid indexing is a recent approach to text indexing that allows the space-usage of
conventional text indexes (eg, suffix trees, suffix arrays, FM-indexes) to scale well with the …

Efficient construction of a complete index for pan-genomics read alignment

A Kuhnle, T Mun, C Boucher, T Gagie… - … on Research in …, 2019 - Springer
While short read aligners, which predominantly use the FM-index, are able to easily index
one or a few human genomes, they do not scale well to indexing databases containing …

Lempel–Ziv-like parsing in small space

D Kosolobov, D Valenzuela, G Navarro, SJ Puglisi - Algorithmica, 2020 - Springer
Abstract Lempel–Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used
compressors for repetitive texts. However, the existing efficient methods computing the exact …

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment

AI Maarala, O Arasalo, D Valenzuela, V Mäkinen… - Plos one, 2021 - journals.plos.org
Computational pan-genomics utilizes information from multiple individual genomes in large-
scale comparative analysis. Genetic variation between case-controls, ethnic groups, or …

CHIC: a short read aligner for pan-genomic references

D Valenzuela, V Mäkinen - biorxiv, 2017 - biorxiv.org
Recently the topic of computational pan-genomics has gained increasing attention, and
particularly the problem of moving from a single-reference paradigm to a pan-genomic one …