Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank

R Ali, OY Tang, ID Connolly, JS Fridley, JH Shin… - …, 2022 - journals.lww.com
METHODS: The 149-question Self-Assessment Neurosurgery Examination Indications
Examination was used to query LLM accuracy. Questions were inputted in a single best
answer, multiple-choice format. χ 2, Fisher exact, and univariable logistic regression tests
assessed differences in performance by question characteristics. RESULTS: On a question
bank with predominantly higher-order questions (85.2%), ChatGPT (GPT-3.5) and GPT-4
answered 62.4%(95% CI: 54.1%-70.1%) and 82.6%(95% CI: 75.2%-88.1%) of questions …
以上显示的是最相近的搜索结果。 查看全部搜索结果