Evaluating gpt-4 and chatgpt on japanese medical licensing examinations

[HTML][HTML] Decoding ChatGPT: a taxonomy of existing research, current challenges, and possible future directions

SS Sohail, F Farhat, Y Himeur, M Nadeem… - Journal of King Saud …, 2023 - Elsevier

Abstract Chat Generative Pre-trained Transformer (ChatGPT) has gained significant interest
and attention since its launch in November 2022. It has shown impressive performance in …

被引用次数：136 相关文章所有 6 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

被引用次数：118 相关文章所有 5 个版本

[PDF] arxiv.org

Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning

VD Lai, NT Ngo, APB Veyseh, H Man… - arXiv preprint arXiv …, 2023 - arxiv.org

Over the last few years, large language models (LLMs) have emerged as the most important
breakthroughs in natural language processing (NLP) that fundamentally transform research …

被引用次数：183 相关文章所有 8 个版本

[PDF] neurips.cc

M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models

W Zhang, M Aljunied, C Gao… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …

被引用次数：67 相关文章所有 5 个版本

[PDF] arxiv.org

Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities

Y Zhu, X Wang, J Chen, S Qiao, Y Ou, Y Yao, S Deng… - World Wide Web, 2024 - Springer

This paper presents an exhaustive quantitative and qualitative evaluation of Large
Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We …

被引用次数：88 相关文章所有 4 个版本

[PDF] arxiv.org

Prompt engineering for healthcare: Methodologies and applications

J Wang, E Shi, S Yu, Z Wu, C Ma, H Dai, Q Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Prompt engineering is a critical technique in the field of natural language processing that
involves designing and optimizing the prompts used to input information into models, aiming …

被引用次数：100 相关文章所有 3 个版本

[PDF] nature.com

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination

M Rosoł, JS Gąsior, J Łaba, K Korzeniewski… - Scientific Reports, 2023 - nature.com

The study aimed to evaluate the performance of two Large Language Models (LLMs):
ChatGPT (based on GPT-3.5) and GPT-4 with two temperature parameter values, on the …

被引用次数：66 相关文章所有 8 个版本

[PDF] nature.com

Towards building multilingual language model for medicine

P Qiu, C Wu, X Zhang, W Lin, H Wang, Y Zhang… - Nature …, 2024 - nature.com

The development of open-source, multilingual medical language models can benefit a wide,
linguistically diverse audience from different regions. To promote this domain, we present …

被引用次数：14 相关文章所有 2 个版本

[PDF] tandfonline.com

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

P Newton, M Xiromeriti - Assessment & Evaluation in Higher …, 2024 - Taylor & Francis

Media coverage suggests that ChatGPT can pass examinations based on multiple choice
questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses …

被引用次数：22 相关文章所有 3 个版本

[PDF] osf.io

[PDF][PDF] ChatGPT performance on MCQ exams in higher education. A pragmatic scoping review

PM Newton, M Xiromeriti - EdArXiv. February, 2023 - files.osf.io

Background. There has been significant interest in the performance of ChatGPT on
professional exams 11 used to assess doctors, scientists and others. These exams are …

被引用次数：41 相关文章所有 4 个版本