Genome annotation: From human genetics to biodiversity genomics

R Guigó - Cell Genomics, 2023 - cell.com
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced.
Identifying genes in these sequences is essential to understand the biology of the species …

Reference flow: reducing reference bias using multiple population genomes

NC Chen, B Solomon, T Mun, S Iyer, B Langmead - Genome biology, 2021 - Springer
Most sequencing data analyses start by aligning sequencing reads to a linear reference
genome, but failure to account for genetic variation leads to reference bias and confounding …

Towards the accurate alignment of over a million protein sequences: Current state of the art

L Santus, E Garriga, S Deorowicz, A Gudyś… - Current Opinion in …, 2023 - Elsevier
Large-scale genomics requires highly scalable and accurate multiple sequence alignment
methods. Results collected over this last decade suggest accuracy loss when scaling up …

MAGUS: multiple sequence alignment using graph clustering

V Smirnov, T Warnow - Bioinformatics, 2021 - academic.oup.com
Motivation The estimation of large multiple sequence alignments (MSAs) is a basic
bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to …

Leveraging protein language models for accurate multiple sequence alignments

CD McWhite, I Armour-Garb, M Singh - Genome Research, 2023 - genome.cshlp.org
Multiple sequence alignment (MSA) is a critical step in the study of protein sequence and
function. Typically, MSA algorithms progressively align pairs of sequences and combine …

Phylogeny estimation given sequence length heterogeneity

V Smirnov, T Warnow - Systematic biology, 2021 - academic.oup.com
Phylogeny estimation is a major step in many biological studies, and has many well known
challenges. With the dropping cost of sequencing technologies, biologists now have …

UPP2: fast and accurate alignment of datasets with fragmentary sequences

M Park, S Ivanovic, G Chu, C Shen, T Warnow - Bioinformatics, 2023 - academic.oup.com
Motivation Multiple sequence alignment (MSA) is a basic step in many bioinformatics
pipelines. However, achieving highly accurate alignments on large datasets, especially …

learnMSA: learning and aligning large protein families

F Becker, M Stanke - GigaScience, 2022 - academic.oup.com
Background The alignment of large numbers of protein sequences is a challenging task and
its importance grows rapidly along with the size of biological datasets. State-of-the-art …

Recursive MAGUS: scalable and accurate multiple sequence alignment

V Smirnov - PLoS Computational Biology, 2021 - journals.plos.org
Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence
data, as few methods can handle large datasets while maintaining alignment accuracy. We …

learnMSA2: deep protein multiple alignments with large language and hidden Markov models

F Becker, M Stanke - Bioinformatics, 2024 - academic.oup.com
Motivation For the alignment of large numbers of protein sequences, tools are predominant
that decide to align two residues using only simple prior knowledge, eg amino acid …