Alignment-free sequence comparison: benefits, applications, and tools

A Zielezinski, S Vinga, J Almeida, WM Karlowski - Genome biology, 2017 - Springer
Alignment-free sequence analyses have been applied to problems ranging from whole-
genome phylogeny to the classification of protein families, identification of horizontally …

Review on the application of machine learning algorithms in the sequence data mining of DNA

A Yang, W Zhang, J Wang, K Yang, Y Han… - … in Bioengineering and …, 2020 - frontiersin.org
Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information
storage. At present, the advancement of sequencing technology had caused DNA sequence …

Semi-supervised and un-supervised clustering: A review and experimental evaluation

K Taha - Information Systems, 2023 - Elsevier
Retrieving, analyzing, and processing large data can be challenging. An effective and
efficient mechanism for overcoming these challenges is to cluster the data into a compact …

Analysis of k-means clustering approach on the breast cancer Wisconsin dataset

AK Dubey, U Gupta, S Jain - … journal of computer assisted radiology and …, 2016 - Springer
Purpose Breast cancer is one of the most common cancers found worldwide and most
frequently found in women. An early detection of breast cancer provides the possibility of its …

[PDF][PDF] Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data

AK Dubey, U Gupta, S Jain - International Journal on Advanced …, 2018 - researchgate.net
Breast cancer is one of the most common forms of cancer having a worldwide prevalence.
Continuous research is going on for detecting breast cancer in its early stage as the …

Information theory applications for biological sequence analysis

S Vinga - Briefings in bioinformatics, 2014 - academic.oup.com
Abstract Information theory (IT) addresses the analysis of communication systems and has
been widely applied in molecular biology. In particular, alignment-free sequence analysis …

MeShClust: an intelligent tool for clustering DNA sequences

BT James, BB Luczak, HZ Girgis - Nucleic acids research, 2018 - academic.oup.com
Sequence clustering is a fundamental step in analyzing DNA sequences. Widely-used
software tools for sequence clustering utilize greedy approaches that are not guaranteed to …

Limits to robustness and reproducibility in the demarcation of operational taxonomic units

TSB Schmidt, JF Matias Rodrigues… - Environmental …, 2015 - Wiley Online Library
The demarcation of operational taxonomic units (OTUs) from complex sequence data sets is
a key step in contemporary studies of microbial ecology. However, as biologically motivated …

Accurately clustering biological sequences in linear time by relatedness sorting

E Wright - Nature Communications, 2024 - nature.com
Clustering biological sequences into similar groups is an increasingly important task as the
number of available sequences continues to grow exponentially. Search-based approaches …

Benchmarking machine learning robustness in COVID-19 genome sequence classification

S Ali, B Sahoo, A Zelikovsky, PY Chen, M Patterson - Scientific Reports, 2023 - nature.com
The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of
sequence data of the SARS-CoV-2 genome—millions of sequences and counting. This …