Data Structures to Represent a Set of k-long DNA Sequences

R Chikhi, J Holub, P Medvedev - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The analysis of biological sequencing data has been one of the biggest applications of
string algorithms. The approaches used in many such applications are based on the …

[HTML][HTML] Mash Screen: high-throughput sequence containment estimation for genome discovery

BD Ondov, GJ Starrett, A Sappington, A Kostic, S Koren… - Genome biology, 2019 - Springer
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two
genomes or metagenomes. However, this method cannot reliably estimate the containment …

To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

RAL Elworth, Q Wang, PK Kota… - Nucleic acids …, 2020 - academic.oup.com
As computational biologists continue to be inundated by ever increasing amounts of
metagenomic data, the need for data analysis approaches that keep up with the pace of …

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

C Marchet, Z Iqbal, D Gautheret, M Salson… - …, 2020 - academic.oup.com
Motivation In this work we present REINDEER, a novel computational method that performs
indexing of sequences and records their abundances across a collection of datasets. To the …

BLight: efficient exact associative structure for k-mers

C Marchet, M Kerbiriou, A Limasset - Bioinformatics, 2021 - academic.oup.com
Motivation A plethora of methods and applications share the fundamental need to associate
information to words for high-throughput sequence analysis. Doing so for billions of k-mers …

[HTML][HTML] Gut-microbial adaptation and transformation of silver nanoparticles mediated the detoxification of Daphnia magna and their offspring

Y Li, WX Wang, H Liu - Environmental Science: Nano, 2022 - pubs.rsc.org
Despite extensive studies on the toxicity of antibacterial silver (either ionic Ag+ or
nanoparticles–AgNPs) at the cellular or organism level, little is known about the differences …

Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters

J Chu, H Mohamadi, E Erhan, J Tse… - Proceedings of the …, 2020 - National Acad Sciences
Alignment-free classification tools have enabled high-throughput processing of sequencing
data in many bioinformatics analysis pipelines primarily due to their computational …

[HTML][HTML] Nutrient-imbalanced conditions shift the interplay between zooplankton and gut microbiota

Y Li, Z Xu, H Liu - BMC genomics, 2021 - Springer
Background Nutrient stoichiometry of phytoplankton frequently changes with aquatic
ambient nutrient concentrations, which is mainly influenced by anthropogenic water …

Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome

W Li, J Freudenberg, J Freudenberg - Gene, 2019 - Elsevier
The nuclear human genome harbors sequences of mitochondrial origin, indicating an
ancestral transfer of DNA from the mitogenome. Several Nuclear Mitochondrial Segments …

[PDF][PDF] Indexing De Bruijn graphs with minimizers

C Marchet, M Kerbiriou, A Limasset - BioRxiv, 2019 - pdfs.semanticscholar.org
Background: The need to associate information to words is shared among a plethora of
applications and methods in high throughput sequence analysis, and could be marked as …