Adaptive seeds tame genomic sequence comparison

SM Kiełbasa, R Wan, K Sato, P Horton… - Genome …, 2011 - genome.cshlp.org
The main way of analyzing biological sequences is by comparing and aligning them to each
other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The …

Evolution of biosequence search algorithms: a brief survey

G Kucherov - Bioinformatics, 2019 - academic.oup.com
Motivation Although modern high-throughput biomolecular technologies produce various
types of data, biosequence data remain at the core of bioinformatic analyses. However …

Fast alignment-free sequence comparison using spaced-word frequencies

CA Leimeister, M Boden, S Horwege, S Lindner… - …, 2014 - academic.oup.com
Motivation: Alignment-free methods for sequence comparison are increasingly used for
genome analysis and phylogeny reconstruction; they circumvent various difficulties of …

Spaced seeds improve k-mer-based metagenomic classification

K Břinda, M Sykulski, G Kucherov - Bioinformatics, 2015 - academic.oup.com
Motivation: Metagenomics is a powerful approach to study genetic content of environmental
samples, which has been strongly promoted by next-generation sequencing technologies …

Minimally overlapping words for sequence similarity search

MC Frith, L Noé, G Kucherov - Bioinformatics, 2020 - academic.oup.com
Motivation Analysis of genetic sequences is usually based on finding similar parts of
sequences, eg DNA reads and/or genomes. For big data, this is typically done via 'seeds' …

A mostly traditional approach improves alignment of bisulfite-converted DNA

MC Frith, R Mori, K Asai - Nucleic acids research, 2012 - academic.oup.com
Cytosines in genomic DNA are sometimes methylated. This affects many biological
processes and diseases. The standard way of measuring methylation is to use bisulfite …

Entropy predicts sensitivity of pseudorandom seeds

BD Maier, K Sahlin - Genome Research, 2023 - genome.cshlp.org
Seed design is important for sequence similarity search applications such as read mapping
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …

A bioinformatician's guide to the forefront of suffix array construction algorithms

AMS Shrestha, MC Frith, P Horton - Briefings in bioinformatics, 2014 - academic.oup.com
The suffix array and its variants are text-indexing data structures that have become
indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an …

SpEED: fast computation of sensitive spaced seeds

L Ilie, S Ilie, A Mansouri Bigvand - Bioinformatics, 2011 - academic.oup.com
Multiple spaced seeds represent the current state-of-the-art for similarity search in
bioinformatics, with applications in various areas such as sequence alignment, read …

rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

L Hahn, CA Leimeister, R Ounit, S Lonardi… - PLoS computational …, 2016 - journals.plos.org
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these
approaches can be improved if binary patterns representing match and don't-care positions …