Genome annotation: From human genetics to biodiversity genomics
R Guigó - Cell Genomics, 2023 - cell.com
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced.
Identifying genes in these sequences is essential to understand the biology of the species …
Identifying genes in these sequences is essential to understand the biology of the species …
Reference flow: reducing reference bias using multiple population genomes
NC Chen, B Solomon, T Mun, S Iyer, B Langmead - Genome biology, 2021 - Springer
Most sequencing data analyses start by aligning sequencing reads to a linear reference
genome, but failure to account for genetic variation leads to reference bias and confounding …
genome, but failure to account for genetic variation leads to reference bias and confounding …
Towards the accurate alignment of over a million protein sequences: Current state of the art
Large-scale genomics requires highly scalable and accurate multiple sequence alignment
methods. Results collected over this last decade suggest accuracy loss when scaling up …
methods. Results collected over this last decade suggest accuracy loss when scaling up …
MAGUS: multiple sequence alignment using graph clustering
V Smirnov, T Warnow - Bioinformatics, 2021 - academic.oup.com
Motivation The estimation of large multiple sequence alignments (MSAs) is a basic
bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to …
bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to …
Leveraging protein language models for accurate multiple sequence alignments
CD McWhite, I Armour-Garb, M Singh - Genome Research, 2023 - genome.cshlp.org
Multiple sequence alignment (MSA) is a critical step in the study of protein sequence and
function. Typically, MSA algorithms progressively align pairs of sequences and combine …
function. Typically, MSA algorithms progressively align pairs of sequences and combine …
Phylogeny estimation given sequence length heterogeneity
V Smirnov, T Warnow - Systematic biology, 2021 - academic.oup.com
Phylogeny estimation is a major step in many biological studies, and has many well known
challenges. With the dropping cost of sequencing technologies, biologists now have …
challenges. With the dropping cost of sequencing technologies, biologists now have …
UPP2: fast and accurate alignment of datasets with fragmentary sequences
Motivation Multiple sequence alignment (MSA) is a basic step in many bioinformatics
pipelines. However, achieving highly accurate alignments on large datasets, especially …
pipelines. However, achieving highly accurate alignments on large datasets, especially …
learnMSA: learning and aligning large protein families
Background The alignment of large numbers of protein sequences is a challenging task and
its importance grows rapidly along with the size of biological datasets. State-of-the-art …
its importance grows rapidly along with the size of biological datasets. State-of-the-art …
Recursive MAGUS: scalable and accurate multiple sequence alignment
V Smirnov - PLoS Computational Biology, 2021 - journals.plos.org
Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence
data, as few methods can handle large datasets while maintaining alignment accuracy. We …
data, as few methods can handle large datasets while maintaining alignment accuracy. We …
learnMSA2: deep protein multiple alignments with large language and hidden Markov models
Motivation For the alignment of large numbers of protein sequences, tools are predominant
that decide to align two residues using only simple prior knowledge, eg amino acid …
that decide to align two residues using only simple prior knowledge, eg amino acid …