Alignment-free sequence analysis and applications

J Ren, X Bai, YY Lu, K Tang, Y Wang… - Annual Review of …, 2018 - annualreviews.org
Genome and metagenome comparisons based on large amounts of next-generation
sequencing (NGS) data pose significant challenges for alignment-based approaches due to …

Genome-powered classification of microbial eukaryotes: focus on coral algal symbionts

KE Dougan, RA González-Pech, TG Stephens… - Trends in …, 2022 - cell.com
Modern microbial taxonomy generally relies on the use of single marker genes or sets of
concatenated genes to generate a framework for the delineation and classification of …

A network-based integrated framework for predicting virus–prokaryote interactions

W Wang, J Ren, K Tang, E Dart… - NAR genomics and …, 2020 - academic.oup.com
Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences;
however, it remains challenging to identify the host (s) of these new viruses. We developed …

Variable number tandem repeats mediate the expression of proximal genes

M Bakhtiari, J Park, YC Ding, S Shleizer-Burko… - Nature …, 2021 - nature.com
Variable number tandem repeats (VNTRs) account for significant genetic variation in many
organisms. In humans, VNTRs have been implicated in both Mendelian and complex …

Predicting host taxonomic information from viral genomes: A comparison of feature representations

F Young, S Rogers, DL Robertson - PLoS computational biology, 2020 - journals.plos.org
The rise in metagenomics has led to an exponential growth in virus discovery. However, the
majority of these new virus sequences have no assigned host. Current machine learning …

Lepidoptera genomes: current knowledge, gaps and future directions

DA Triant, SD Cinel, AY Kawahara - Current opinion in insect science, 2018 - Elsevier
Highlights•Despite being an ecologically diverse and speciose insect order, genomes are
available for< 10 of the 43 Lepidoptera superfamilies.•Genome-scale data are advancing …

The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances

S Röhling, A Linne, J Schellhorn, M Hosseini… - Plos one, 2020 - journals.plos.org
We study the number N k of length-k word matches between pairs of evolutionarily related
DNA sequences, as a function of k. We show that the Jukes-Cantor distance between two …

[HTML][HTML] Synonymous nucleotide changes drive papillomavirus evolution

KM King, EV Rajadhyaksha, IG Tobey… - Tumour Virus …, 2022 - Elsevier
Papillomaviruses have been evolving alongside their hosts for at least 450 million years.
This review will discuss some of the insights gained into the evolution of this diverse family …

[HTML][HTML] Enhancing metagenomic classification with compression-based features

JM Silva, JR Almeida - Artificial Intelligence in Medicine, 2024 - Elsevier
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology
to analyze the genetic makeup of environmental samples. However, accurately identifying …

CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices

S Liu, D Koslicki - Bioinformatics, 2022 - academic.oup.com
Motivation K-mer-based methods are used ubiquitously in the field of computational biology.
However, determining the optimal value of k for a specific application often remains …