查看文章

On the Utility of Large Language Model Embeddings for Revolutionizing Semantic Data Harmonization in Alzheimer's and Parkinson’s Disease

作者

Yasamin Salimi, Tim Adams, Mehmet Can Ay, Helena Balabin, Marc Jacobs, Martin Hofmann-Apitius

发表日期

2024/4/1

简介

Data Harmonization is an important yet time-consuming process. With the recent popularity of applications using Large Language Models (LLMs) due to their high capabilities in text understanding, we investigated whether LLMs could facilitate data harmonization for clinical use cases. To evaluate this, we created PASSIONATE, a novel Parkinson's disease (PD) Common Data Model (CDM) as a ground truth source for pairwise cohort harmonization using LLMs. Additionally, we extended our investigation using an existing Alzheimer’s disease (AD) CDM. We computed text embeddings based on two LLMs to perform automated cohort harmonization for both AD and PD. We additionally compared the results to a baseline method using fuzzy string matching to determine the degree to which the semantic understanding of LLMs can improve our harmonization results. We found that mappings based on text embeddings …

学术搜索中的文章

On the Utility of Large Language Model Embeddings for Revolutionizing Semantic Data Harmonization in Alzheimer's and Parkinson’s Disease

Y Salimi, T Adams, MC Ay, H Balabin, M Jacobs… - 2024