Does BERT pretrained on clinical notes reveal sensitive data?

E Lehman, S Jain, K Pichotta, Y Goldberg… - arXiv preprint arXiv …, 2021 - arxiv.org
Large Transformers pretrained over clinical notes from Electronic Health Records (EHR)
have afforded substantial gains in performance on predictive clinical tasks. The cost of …

A review of automatic end-to-end de-identification: Is high accuracy the only metric?

V Yogarajan, B Pfahringer, M Mayo - Applied Artificial Intelligence, 2020 - Taylor & Francis
De-identification of electronic health records (EHR) is a vital step toward advancing health
informatics research and maximizing the use of available data. It is a two-step process …

Cohort selection for clinical trials: n2c2 2018 shared task track 1

A Stubbs, M Filannino, E Soysal… - Journal of the …, 2019 - academic.oup.com
Abstract Objective Track 1 of the 2018 National NLP Clinical Challenges shared tasks
focused on identifying which patients in a corpus of longitudinal medical records meet and …

Generating synthetic training data for supervised de-identification of electronic health records

CA Libbi, J Trienes, D Trieschnigg, C Seifert - Future Internet, 2021 - mdpi.com
A major hurdle in the development of natural language processing (NLP) methods for
Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns …

Comparing rule-based, feature-based and deep neural methods for de-identification of dutch medical records

J Trienes, D Trieschnigg, C Seifert… - arXiv preprint arXiv …, 2020 - arxiv.org
Unstructured information in electronic health records provide an invaluable resource for
medical research. To protect the confidentiality of patients and to conform to privacy …

CodE Alltag 2.0—A pseudonymized German-language email corpus

E Eder, U Krieg-Holz, U Hahn - Proceedings of the Twelfth …, 2020 - aclanthology.org
The vast amount of social communication distributed over various electronic media channels
(tweets, blogs, emails, etc.), so-called user-generated content (UGC), creates entirely new …

De-identification of emails: Pseudonymizing privacy-sensitive data in a german email corpus

E Eder, U Krieg-Holz, U Hahn - Proceedings of the International …, 2019 - aclanthology.org
We deal with the pseudonymization of those stretches of text in emails that might allow to
identify real individual persons. This task is decomposed into two steps. First, named entities …

A survey of automatic de-identification of longitudinal clinical narratives

V Yogarajan, M Mayo, B Pfahringer - arXiv preprint arXiv:1810.06765, 2018 - arxiv.org
Use of medical data, also known as electronic health records, in research helps develop and
advance medical science. However, protecting patient confidentiality and identity while …

Generation of surrogates for de-identification of electronic health records

A Chen, J Jonnagaddala, C Nekkantti… - MEDINFO 2019: Health …, 2019 - ebooks.iospress.nl
Unstructured electronic health records are valuable resources for research. Before they are
shared with researchers, protected health information needs to be removed from these …

[HTML][HTML] Differentially private de-identifying textual medical document is compliant with challenging NLP analyses: Example of privacy-preserving ICD-10 code …

Y Tchouka, JF Couchot, D Laiymani, P Selles… - Intelligent Systems with …, 2024 - Elsevier
Medical research plays a crucial role within scientific research. Technological
advancements, especially those related to the rise of machine learning, pave the way for the …