A review of methods and databases for metagenomic classification and assembly
Microbiome research has grown rapidly over the past decade, with a proliferation of new
methods that seek to make sense of large, complex data sets. Here, we survey two of the …
methods that seek to make sense of large, complex data sets. Here, we survey two of the …
Spaced seeds improve k-mer-based metagenomic classification
Motivation: Metagenomics is a powerful approach to study genetic content of environmental
samples, which has been strongly promoted by next-generation sequencing technologies …
samples, which has been strongly promoted by next-generation sequencing technologies …
How to optimally sample a sequence for rapid analysis
Motivation We face an increasing flood of genetic sequence data, from diverse sources,
requiring rapid computational analysis. Rapid analysis can be achieved by sampling a …
requiring rapid computational analysis. Rapid analysis can be achieved by sampling a …
Estimating evolutionary distances between genomic sequences from spaced-word matches
B Morgenstern, B Zhu, S Horwege… - Algorithms for Molecular …, 2015 - Springer
Alignment-free methods are increasingly used to calculate evolutionary distances between
DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods …
DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods …
Entropy predicts sensitivity of pseudorandom seeds
Seed design is important for sequence similarity search applications such as read mapping
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these
approaches can be improved if binary patterns representing match and don't-care positions …
approaches can be improved if binary patterns representing match and don't-care positions …
Comparison of metagenomics and metatranscriptomics tools: a guide to making the right choice
LC Terrón-Camero, F Gordillo-González… - Genes, 2022 - mdpi.com
The study of microorganisms is a field of great interest due to their environmental (eg, soil
contamination) and biomedical (eg, parasitic diseases, autism) importance. The advent of …
contamination) and biomedical (eg, parasitic diseases, autism) importance. The advent of …
Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters
Alignment-free classification tools have enabled high-throughput processing of sequencing
data in many bioinformatics analysis pipelines primarily due to their computational …
data in many bioinformatics analysis pipelines primarily due to their computational …
[PDF][PDF] PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies
Motivation Genome sequencing projects sometimes uncover more organisms than
expected, especially for complex and/or non-model organisms. It is therefore useful to …
expected, especially for complex and/or non-model organisms. It is therefore useful to …
Sweep: representing large biological sequences datasets in compact vectors
CR De Pierri, R Voyceik, LGC Santos de Mattos… - Scientific reports, 2020 - nature.com
Vectoral and alignment-free approaches to biological sequence representation have been
explored in bioinformatics to efficiently handle big data. Even so, most current methods …
explored in bioinformatics to efficiently handle big data. Even so, most current methods …