The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

Bibliographic Details
Title: The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.
Authors: Bongco, Edgar Dominic A., Cua, Sean Kendrich N., Hernandez, Mary Angeline Luz U., Pascual, Juan Silvestre G., Khu, Kathleen Joy O.
Source: Neurosurgical Review; 12/7/2024, Vol. 47 Issue 1, p1-8, 8p
Subject Terms: LANGUAGE models, CHATGPT, MEDICAL education, NEUROSURGERY, CONFIDENCE intervals
Abstract: Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education. [ABSTRACT FROM AUTHOR]
Copyright of Neurosurgical Review is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edb&genre=article&issn=03445607&ISBN=&volume=47&issue=1&date=20241207&spage=1&pages=1-8&title=Neurosurgical Review&atitle=The%20performance%20of%20ChatGPT%20versus%20neurosurgery%20residents%20in%20neurosurgical%20board%20examination-like%20questions%3A%20a%20systematic%20review%20and%20meta-analysis.&aulast=Bongco%2C%20Edgar%20Dominic%20A.&id=DOI:10.1007/s10143-024-03144-y
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edb
DbLabel: Complementary Index
An: 181495163
RelevancyScore: 1060
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1060.21301269531
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.
– Name: Author
  Label: Authors
  Group: Au
  Data: &lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Bongco%2C+Edgar+Dominic+A%2E%22&quot;&gt;Bongco, Edgar Dominic A.&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Cua%2C+Sean+Kendrich+N%2E%22&quot;&gt;Cua, Sean Kendrich N.&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Hernandez%2C+Mary+Angeline+Luz+U%2E%22&quot;&gt;Hernandez, Mary Angeline Luz U.&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Pascual%2C+Juan+Silvestre+G%2E%22&quot;&gt;Pascual, Juan Silvestre G.&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Khu%2C+Kathleen+Joy+O%2E%22&quot;&gt;Khu, Kathleen Joy O.&lt;/searchLink&gt;
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Neurosurgical Review; 12/7/2024, Vol. 47 Issue 1, p1-8, 8p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: &lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22LANGUAGE+models%22&quot;&gt;LANGUAGE models&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22CHATGPT%22&quot;&gt;CHATGPT&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22MEDICAL+education%22&quot;&gt;MEDICAL education&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22NEUROSURGERY%22&quot;&gt;NEUROSURGERY&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22CONFIDENCE+intervals%22&quot;&gt;CONFIDENCE intervals&lt;/searchLink&gt;
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT&#39;s inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents&#39; scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents&#39; accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p &lt; 0.00001), with high heterogeneity (I&lt;superscript&gt;2&lt;/superscript&gt; = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: &lt;i&gt;Copyright of Neurosurgical Review is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder&#39;s express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.&lt;/i&gt; (Copyright applies to all Abstracts.)
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edb&AN=181495163
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s10143-024-03144-y
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 8
        StartPage: 1
    Subjects:
      – SubjectFull: LANGUAGE models
        Type: general
      – SubjectFull: CHATGPT
        Type: general
      – SubjectFull: MEDICAL education
        Type: general
      – SubjectFull: NEUROSURGERY
        Type: general
      – SubjectFull: CONFIDENCE intervals
        Type: general
    Titles:
      – TitleFull: The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Bongco, Edgar Dominic A.
      – PersonEntity:
          Name:
            NameFull: Cua, Sean Kendrich N.
      – PersonEntity:
          Name:
            NameFull: Hernandez, Mary Angeline Luz U.
      – PersonEntity:
          Name:
            NameFull: Pascual, Juan Silvestre G.
      – PersonEntity:
          Name:
            NameFull: Khu, Kathleen Joy O.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 07
              M: 12
              Text: 12/7/2024
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 03445607
          Numbering:
            – Type: volume
              Value: 47
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Neurosurgical Review
              Type: main
ResultId 1