Adaptive seeds tame genomic sequence comparison
The main way of analyzing biological sequences is by comparing and aligning them to each
other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The …
other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The …
Evolution of biosequence search algorithms: a brief survey
G Kucherov - Bioinformatics, 2019 - academic.oup.com
Motivation Although modern high-throughput biomolecular technologies produce various
types of data, biosequence data remain at the core of bioinformatic analyses. However …
types of data, biosequence data remain at the core of bioinformatic analyses. However …
Fast alignment-free sequence comparison using spaced-word frequencies
Motivation: Alignment-free methods for sequence comparison are increasingly used for
genome analysis and phylogeny reconstruction; they circumvent various difficulties of …
genome analysis and phylogeny reconstruction; they circumvent various difficulties of …
Spaced seeds improve k-mer-based metagenomic classification
Motivation: Metagenomics is a powerful approach to study genetic content of environmental
samples, which has been strongly promoted by next-generation sequencing technologies …
samples, which has been strongly promoted by next-generation sequencing technologies …
Minimally overlapping words for sequence similarity search
Motivation Analysis of genetic sequences is usually based on finding similar parts of
sequences, eg DNA reads and/or genomes. For big data, this is typically done via 'seeds' …
sequences, eg DNA reads and/or genomes. For big data, this is typically done via 'seeds' …
A mostly traditional approach improves alignment of bisulfite-converted DNA
Cytosines in genomic DNA are sometimes methylated. This affects many biological
processes and diseases. The standard way of measuring methylation is to use bisulfite …
processes and diseases. The standard way of measuring methylation is to use bisulfite …
Entropy predicts sensitivity of pseudorandom seeds
Seed design is important for sequence similarity search applications such as read mapping
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …
A bioinformatician's guide to the forefront of suffix array construction algorithms
The suffix array and its variants are text-indexing data structures that have become
indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an …
indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an …
SpEED: fast computation of sensitive spaced seeds
L Ilie, S Ilie, A Mansouri Bigvand - Bioinformatics, 2011 - academic.oup.com
Multiple spaced seeds represent the current state-of-the-art for similarity search in
bioinformatics, with applications in various areas such as sequence alignment, read …
bioinformatics, with applications in various areas such as sequence alignment, read …
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these
approaches can be improved if binary patterns representing match and don't-care positions …
approaches can be improved if binary patterns representing match and don't-care positions …