Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis

Q Wei, Z Yao, Y Cui, B Wei, Z Jin, X Xu - Journal of Biomedical Informatics, 2024 - Elsevier
Abstract Objective Large language models (LLMs) such as ChatGPT are increasingly
explored in medical domains. However, the absence of standard guidelines for performance …

Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test

A Moglia, K Georgiou, P Cerveri, L Mainardi… - Artificial Intelligence …, 2024 - Springer
Large language models (LLMs) have the intrinsic potential to acquire medical knowledge.
Several studies assessing LLMs on medical examinations have been published. However …

[HTML][HTML] Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions

M Moshirfar, AW Altaf, IM Stoakes, JJ Tuttle… - Cureus, 2023 - ncbi.nlm.nih.gov
Purpose This study evaluates the performance of two ChatGPT models (GPT-3.5 and GPT-
4) and human professionals in answering ophthalmology questions from the StatPearls …

Performance of ChatGPT on nephrology test questions

J Miao, C Thongprayoon, OAG Valencia… - Clinical Journal of the …, 2023 - journals.lww.com
Background: ChatGPT is a novel tool that allows people to engage in conversations with an
advanced machine learning model. ChatGPT's performance in the United States Medical …

Benchmarking open-source large language models, GPT-4 and Claude 2 on multiple-choice questions in nephrology

S Wu, M Koo, L Blum, A Black, L Kao, Z Fei, F Scalzo… - NEJM AI, 2024 - ai.nejm.org
Background In recent years, significant breakthroughs have been made in the field of natural
language processing, particularly with the development of large language models (LLMs) …

Evaluating the efficacy of ChatGPT in navigating the Spanish medical residency entrance examination (MIR): Promising horizons for AI in clinical medicine

F Guillen-Grima, S Guillen-Aguinaga… - Clinics and …, 2023 - mdpi.com
The rapid progress in artificial intelligence, machine learning, and natural language
processing has led to increasingly sophisticated large language models (LLMs) for use in …

Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

CE Onder, G Koc, P Gokbulut, I Taskaldiran… - Scientific reports, 2024 - nature.com
Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on
both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a …

Accuracy of ChatGPT in common gastrointestinal diseases: impact for patients and providers

A Kerbage, J Kassab, J El Dahdah… - Clinical …, 2024 - cghjournal.org
Since its release in 2022, Chat Generative Pre-Trained Transformer (ChatGPT) became the
most rapidly expanding consumer software application in history, 1 and its role in medicine …

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis

HK Jin, HE Lee, EY Kim - BMC Medical Education, 2024 - Springer
Background ChatGPT, a recently developed artificial intelligence (AI) chatbot, has
demonstrated improved performance in examinations in the medical field. However, thus far …

A comparative study of open-source large language models, gpt-4 and claude 2: Multiple-choice test taking in nephrology

S Wu, M Koo, L Blum, A Black, L Kao, F Scalzo… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, there have been significant breakthroughs in the field of natural language
processing, particularly with the development of large language models (LLMs). These …