Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

OK Tørresen, B Star, P Mier… - Nucleic acids …, 2019 - academic.oup.com
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across
the tree of life imposes fundamental challenges for sequencing, genome assembly, and …

The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics

A Escobar-Zepeda, A Vera-Ponce de León… - Frontiers in …, 2015 - frontiersin.org
The study of microorganisms that pervade each and every part of this planet has
encountered many challenges through time such as the discovery of unknown organisms …

RepeatModeler2 for automated genomic discovery of transposable element families

JM Flynn, R Hubley, C Goubert… - Proceedings of the …, 2020 - National Acad Sciences
The accelerating pace of genome sequencing throughout the tree of life is driving the need
for improved unsupervised annotation of genome components such as transposable …

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

K Katoh, J Rozewicki, KD Yamada - Briefings in bioinformatics, 2019 - academic.oup.com
This article describes several features in the MAFFT online service for multiple sequence
alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers …

The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication

W Zhuang, H Chen, M Yang, J Wang, MK Pandey… - Nature …, 2019 - nature.com
High oil and protein content make tetraploid peanut a leading oil and food legume. Here we
report a high-quality peanut genome sequence, comprising 2.54 Gb with 20 …

The Ensembl gene annotation system

BL Aken, S Ayling, D Barrell, L Clarke, V Curwen… - Database, 2016 - academic.oup.com
The Ensembl gene annotation system has been used to annotate over 70 different
vertebrate species across a wide range of genome projects. Furthermore, it generates the …

A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the triticeae tribe as a pilot practice in the plant pangenomic era

Y Chen, W Song, X Xie, Z Wang, P Guan, H Peng… - Molecular Plant, 2020 - cell.com
Plant genome sequencing has dramatically increased, and some species even have
multiple high-quality reference versions. Demands for clade-specific homology inference …

The genome of Chenopodium quinoa

DE Jarvis, YS Ho, DJ Lightfoot, SM Schmöckel, B Li… - Nature, 2017 - nature.com
Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to
improve world food security. Unfortunately, few resources are available to facilitate its …

A simple method to control over-alignment in the MAFFT multiple sequence alignment program

K Katoh, DM Standley - Bioinformatics, 2016 - academic.oup.com
Motivation: We present a new feature of the MAFFT multiple alignment program for
suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly …

Using intron position conservation for homology-based gene prediction

J Keilwagen, M Wenk, JL Erickson… - Nucleic acids …, 2016 - academic.oup.com
Annotation of protein-coding genes is very important in bioinformatics and biology and has a
decisive influence on many downstream analyses. Homology-based gene prediction …