作者
Yanshan Wang, Sijia Liu, Naveed Afzal, Majid Rastegar-Mojarad, Liwei Wang, Feichen Shen, Paul Kingsbury, Hongfang Liu
发表日期
2018/11/1
期刊
Journal of biomedical informatics
卷号
87
页码范围
12-20
出版商
Academic Press
简介
Background
Word embeddings have been prevalently used in biomedical Natural Language Processing (NLP) applications due to the ability of the vector representations being able to capture useful semantic properties and linguistic relationships between words. Different textual resources (e.g., Wikipedia and biomedical literature corpus) have been utilized in biomedical NLP to train word embeddings and these word embeddings have been commonly leveraged as feature input to downstream machine learning models. However, there has been little work on evaluating the word embeddings trained from different textual resources.
Methods
In this study, we empirically evaluated word embeddings trained from four different corpora, namely clinical notes, biomedical publications, Wikipedia, and news. For the former two resources, we trained word embeddings using unstructured electronic health record (EHR) data …
引用总数
201820192020202120222023202411767985776218
学术搜索中的文章
Y Wang, S Liu, N Afzal, M Rastegar-Mojarad, L Wang… - Journal of biomedical informatics, 2018