Data integration challenges for machine learning in precision medicine

M Martínez-García, E Hernández-Lemus - Frontiers in medicine, 2022 - frontiersin.org
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on
different databases about the molecular and environmental origins of disease, into analytic …

Spatial patterns of CTCF sites define the anatomy of TADs and their boundaries

L Nanni, S Ceri, C Logie - Genome biology, 2020 - Springer
Abstract Background Topologically associating domains (TADs) are genomic regions of self-
interaction. Additionally, it is known that TAD boundaries are enriched in CTCF binding sites …

GenoSurf: metadata driven semantic search system for integrated genomic datasets

A Canakoglu, A Bernasconi, A Colombo… - Database, 2019 - academic.oup.com
Many valuable resources developed by world-wide research institutions and consortia
describe genomic datasets that are both open and available for secondary research, but …

Framing Apache Spark in life sciences

A Manconi, M Gnocchi, L Milanesi, O Marullo… - Heliyon, 2023 - cell.com
Advances in high-throughput and digital technologies have required the adoption of big data
for handling complex tasks in life sciences. However, the drift to big data led researchers to …

META-BASE: a novel architecture for large-scale genomic metadata integration

A Bernasconi, A Canakoglu… - … /ACM Transactions on …, 2020 - ieeexplore.ieee.org
The integration of genomic metadata is, at the same time, an important, difficult, and well-
recognized challenge. It is important because a wealth of public data repositories is …

GeCoAgent: a conversational agent for empowering genomic data extraction and analysis

P Crovari, S Pidò, P Pinoli, A Bernasconi… - ACM Transactions on …, 2021 - dl.acm.org
With the availability of reliable and low-cost DNA sequencing, human genomics is relevant
to a growing number of end-users, including biologists and clinicians. Typical interactions …

GeMI: interactive interface for transformer-based Genomic Metadata Integration

G Serna Garcia, M Leone, A Bernasconi… - Database, 2022 - academic.oup.com
Abstract The Gene Expression Omnibus (GEO) is a public archive containing> 4 million
digital samples from functional genomics experiments collected over almost two decades …

OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata

E Cappelli, F Cumbo, A Bernasconi, A Canakoglu… - Applied Sciences, 2020 - mdpi.com
Next Generation Sequencing technologies have produced a substantial increase of publicly
available genomic data and related clinical/biospecimen information. New models and …

Genomic data integration and user-defined sample-set extraction for population variant analysis

T Alfonsi, A Bernasconi, A Canakoglu, M Masseroli - BMC bioinformatics, 2022 - Springer
Background Population variant analysis is of great importance for gathering insights into the
links between human genotype and phenotype. The 1000 Genomes Project established a …

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

S Pallotta, S Cascianelli, M Masseroli - BMC bioinformatics, 2022 - Springer
Background Heterogeneous omics data, increasingly collected through high-throughput
technologies, can contain hidden answers to very important and still unsolved biomedical …