A review of methods and databases for metagenomic classification and assembly

FP Breitwieser, J Lu, SL Salzberg - Briefings in bioinformatics, 2019 - academic.oup.com
Microbiome research has grown rapidly over the past decade, with a proliferation of new
methods that seek to make sense of large, complex data sets. Here, we survey two of the …

Spaced seeds improve k-mer-based metagenomic classification

K Břinda, M Sykulski, G Kucherov - Bioinformatics, 2015 - academic.oup.com
Motivation: Metagenomics is a powerful approach to study genetic content of environmental
samples, which has been strongly promoted by next-generation sequencing technologies …

How to optimally sample a sequence for rapid analysis

MC Frith, J Shaw, JL Spouge - Bioinformatics, 2023 - academic.oup.com
Motivation We face an increasing flood of genetic sequence data, from diverse sources,
requiring rapid computational analysis. Rapid analysis can be achieved by sampling a …

Estimating evolutionary distances between genomic sequences from spaced-word matches

B Morgenstern, B Zhu, S Horwege… - Algorithms for Molecular …, 2015 - Springer
Alignment-free methods are increasingly used to calculate evolutionary distances between
DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods …

Entropy predicts sensitivity of pseudorandom seeds

BD Maier, K Sahlin - Genome Research, 2023 - genome.cshlp.org
Seed design is important for sequence similarity search applications such as read mapping
and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are …

rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

L Hahn, CA Leimeister, R Ounit, S Lonardi… - PLoS computational …, 2016 - journals.plos.org
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these
approaches can be improved if binary patterns representing match and don't-care positions …

Comparison of metagenomics and metatranscriptomics tools: a guide to making the right choice

LC Terrón-Camero, F Gordillo-González… - Genes, 2022 - mdpi.com
The study of microorganisms is a field of great interest due to their environmental (eg, soil
contamination) and biomedical (eg, parasitic diseases, autism) importance. The advent of …

Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters

J Chu, H Mohamadi, E Erhan, J Tse… - Proceedings of the …, 2020 - National Acad Sciences
Alignment-free classification tools have enabled high-throughput processing of sequencing
data in many bioinformatics analysis pipelines primarily due to their computational …

[PDF][PDF] PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies

L Mallet, T Bitard-Feildel, F Cerutti, H Chiapello - Bioinformatics, 2017 - academic.oup.com
Motivation Genome sequencing projects sometimes uncover more organisms than
expected, especially for complex and/or non-model organisms. It is therefore useful to …

Sweep: representing large biological sequences datasets in compact vectors

CR De Pierri, R Voyceik, LGC Santos de Mattos… - Scientific reports, 2020 - nature.com
Vectoral and alignment-free approaches to biological sequence representation have been
explored in bioinformatics to efficiently handle big data. Even so, most current methods …