De novo gene birth

SB Van Oss, AR Carvunis - PLoS genetics, 2019 - journals.plos.org
De novo gene birth is the process by which new genes evolve from DNA sequences that
were ancestrally non-genic. De novo genes represent a subset of novel genes, and may be …

Improved global protein homolog detection with major gains in function identification

M Kilinc, K Jia, RL Jernigan - Proceedings of the National …, 2023 - National Acad Sciences
There are several hundred million protein sequences, but the relationships among them are
not fully available from existing homolog detection methods. There is an essential need for …

From de novo to “de nono”: the majority of novel protein-coding genes identified with phylostratigraphy are old genes or recent duplicates

C Casola - Genome biology and evolution, 2018 - academic.oup.com
The evolution of novel protein-coding genes from noncoding regions of the genome is one
of the most compelling pieces of evidence for genetic innovations in nature. One popular …

An efficient protein homology detection approach based on seq2seq model and ranking

S Gao, S Yu, S Yao - Biotechnology & Biotechnological Equipment, 2021 - Taylor & Francis
Evolutionary information is essential for the protein annotation. The number of homologs of a
protein retrieved is correlated with the annotations related to the protein structure or function …

Deep semantic protein representation for annotation, discovery, and engineering

AS Schwartz, GJ Hannum, ZR Dwiel, ME Smoot… - BioRxiv, 2018 - biorxiv.org
Computational assignment of function to proteins with no known homologs is still an
unsolved problem. We have created a novel, function-based approach to protein annotation …

Protein domain embeddings for fast and accurate similarity search

BG Iovino, H Tang, Y Ye - International Conference on Research in …, 2024 - Springer
Recently developed protein language models have enabled a variety of applications of the
protein contextual embeddings. Per-protein representations (each protein is represented as …

Ultra-fast global homology detection with discrete cosine transform and dynamic time warping

D Raimondi, G Orlando, Y Moreau, WF Vranken - Bioinformatics, 2018 - academic.oup.com
Motivation Evolutionary information is crucial for the annotation of proteins in bioinformatics.
The amount of retrieved homologs often correlates with the quality of predicted protein …

Searching for an identity: Functional characterization of taxonomically restricted genes in grain amaranth

G Cabrales-Orona, JP Délano-Frier - The amaranth genome, 2021 - Springer
Taxonomically restricted genes, or TRGs, are specific to a particular taxon that can be found
only in the genomes of single species or are represented as orthologs in closely related …

Multiple sequence alignment is not a solved problem

DA Morrison - arXiv preprint arXiv:1808.07717, 2018 - arxiv.org
Multiple sequence alignment is a basic procedure in molecular biology, and it is often
treated as being essentially a solved computational problem. However, this is not so, and …

A new paradigm for biological sequence retrieval inspired by natural language processing and database research

AJ Rousseau, S Lemal, Y Korovin, G Triantopoulos… - bioRxiv, 2023 - biorxiv.org
Nearly-exponential growth and heterogeneity of biological sequence data make the task of
biological sequence retrieval from databases more important and challenging than ever. In …