Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q Xie, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Clinical text datasets for medical artificial intelligence and large language models—a systematic review

J Wu, X Liu, M Li, W Li, Z Su, S Lin, L Garay, Z Zhang… - NEJM AI, 2024 - ai.nejm.org
Privacy and ethical considerations limit access to large-scale clinical datasets, particularly
clinical text data, which contain extensive and diverse information and serve as the …

Huatuogpt, towards taming language model to be a doctor

H Zhang, J Chen, F Jiang, F Yu, Z Chen, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we present HuatuoGPT, a large language model (LLM) for medical
consultation. The core recipe of HuatuoGPT is to leverage both\textit {distilled data from …

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

K He, R Mao, Q Lin, Y Ruan, X Lan, M Feng… - arXiv preprint arXiv …, 2023 - arxiv.org
The utilization of large language models (LLMs) in the Healthcare domain has generated
both excitement and concern due to their ability to effectively respond to freetext queries with …

BioBART: Pretraining and evaluation of a biomedical generative language model

H Yuan, Z Yuan, R Gan, J Zhang, Y Xie… - arXiv preprint arXiv …, 2022 - arxiv.org
Pretrained language models have served as important backbones for natural language
processing. Recently, in-domain pretraining has been shown to benefit various domain …

Cmb: A comprehensive medical benchmark in chinese

X Wang, GH Chen, D Song, Z Zhang, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) provide a possibility to make a great breakthrough in
medicine. The establishment of a standardized medical benchmark becomes a fundamental …

Bigbio: A framework for data-centric biomedical natural language processing

J Fries, L Weber, N Seelam, G Altay… - Advances in …, 2022 - proceedings.neurips.cc
Training and evaluating language models increasingly requires the construction of meta-
datasets--diverse collections of curated data with clear provenance. Natural language …

Huatuo-26m, a large-scale chinese medical qa dataset

J Li, X Wang, X Wu, Z Zhang, X Xu, J Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we release a largest ever medical Question Answering (QA) dataset with 26
million QA pairs. We benchmark many existing approaches in our dataset in terms of both …

Disc-medllm: Bridging general large language models and real-world medical consultation

Z Bao, W Chen, S Xiao, K Ren, J Wu, C Zhong… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose DISC-MedLLM, a comprehensive solution that leverages Large Language
Models (LLMs) to provide accurate and truthful medical response in end-to-end …

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks

L Luo, J Ning, Y Zhao, Z Wang, Z Ding… - Journal of the …, 2024 - academic.oup.com
Objective Most existing fine-tuned biomedical large language models (LLMs) focus on
enhancing performance in monolingual biomedical question answering and conversation …