The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

Bibliographic Details
Title:	The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.
Authors:	Bongco, Edgar Dominic A., Cua, Sean Kendrich N., Hernandez, Mary Angeline Luz U., Pascual, Juan Silvestre G., Khu, Kathleen Joy O.
Source:	Neurosurgical Review; 12/7/2024, Vol. 47 Issue 1, p1-8, 8p
Subject Terms:	LANGUAGE models, CHATGPT, MEDICAL education, NEUROSURGERY, CONFIDENCE intervals
Abstract:	Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I² = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education. [ABSTRACT FROM AUTHOR]
	Copyright of Neurosurgical Review is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Complementary Index

FullText	Text: Availability: 0 CustomLinks: – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edb&genre=article&issn=03445607&ISBN=&volume=47&issue=1&date=20241207&spage=1&pages=1-8&title=Neurosurgical Review&atitle=The%20performance%20of%20ChatGPT%20versus%20neurosurgery%20residents%20in%20neurosurgical%20board%20examination-like%20questions%3A%20a%20systematic%20review%20and%20meta-analysis.&aulast=Bongco%2C%20Edgar%20Dominic%20A.&id=DOI:10.1007/s10143-024-03144-y Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries
Header	DbId: edb DbLabel: Complementary Index An: 181495163 RelevancyScore: 1060 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1060.21301269531
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Bongco%2C+Edgar+Dominic+A%2E%22">Bongco, Edgar Dominic A.</searchLink><br /><searchLink fieldCode="AR" term="%22Cua%2C+Sean+Kendrich+N%2E%22">Cua, Sean Kendrich N.</searchLink><br /><searchLink fieldCode="AR" term="%22Hernandez%2C+Mary+Angeline+Luz+U%2E%22">Hernandez, Mary Angeline Luz U.</searchLink><br /><searchLink fieldCode="AR" term="%22Pascual%2C+Juan+Silvestre+G%2E%22">Pascual, Juan Silvestre G.</searchLink><br /><searchLink fieldCode="AR" term="%22Khu%2C+Kathleen+Joy+O%2E%22">Khu, Kathleen Joy O.</searchLink> – Name: TitleSource Label: Source Group: Src Data: Neurosurgical Review; 12/7/2024, Vol. 47 Issue 1, p1-8, 8p – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22CHATGPT%22">CHATGPT</searchLink><br /><searchLink fieldCode="DE" term="%22MEDICAL+education%22">MEDICAL education</searchLink><br /><searchLink fieldCode="DE" term="%22NEUROSURGERY%22">NEUROSURGERY</searchLink><br /><searchLink fieldCode="DE" term="%22CONFIDENCE+intervals%22">CONFIDENCE intervals</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I<superscript>2</superscript> = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education. [ABSTRACT FROM AUTHOR] – Name: Abstract Label: Group: Ab Data: <i>Copyright of Neurosurgical Review is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink	https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edb&AN=181495163
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s10143-024-03144-y Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 8 StartPage: 1 Subjects: – SubjectFull: LANGUAGE models Type: general – SubjectFull: CHATGPT Type: general – SubjectFull: MEDICAL education Type: general – SubjectFull: NEUROSURGERY Type: general – SubjectFull: CONFIDENCE intervals Type: general Titles: – TitleFull: The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Bongco, Edgar Dominic A. – PersonEntity: Name: NameFull: Cua, Sean Kendrich N. – PersonEntity: Name: NameFull: Hernandez, Mary Angeline Luz U. – PersonEntity: Name: NameFull: Pascual, Juan Silvestre G. – PersonEntity: Name: NameFull: Khu, Kathleen Joy O. IsPartOfRelationships: – BibEntity: Dates: – D: 07 M: 12 Text: 12/7/2024 Type: published Y: 2024 Identifiers: – Type: issn-print Value: 03445607 Numbering: – Type: volume Value: 47 – Type: issue Value: 1 Titles: – TitleFull: Neurosurgical Review Type: main
ResultId	1