Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey

Bibliographic Details
Title: Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey
Authors: Saverio La Bella, Marina Attanasi, Annamaria Porreca, Armando Di Ludovico, Maria Cristina Maggio, Romina Gallizzi, Francesco La Torre, Donato Rigante, Francesca Soscia, Francesca Ardenti Morini, Antonella Insalaco, Marco Francesco Natale, Francesco Chiarelli, Gabriele Simonini, Fabrizio De Benedetti, Marco Gattorno, Luciana Breda
Source: Pediatric Rheumatology Online Journal, Vol 22, Iss 1, Pp 1-11 (2024)
Publisher Information: BMC, 2024.
Publication Year: 2024
Collection: LCC:Pediatrics
LCC:Diseases of the musculoskeletal system
Subject Terms: Artificial intelligence, AI, Pediatric rheumatology, Familial mediterranean fever, Generative artificial intelligence, FMF, Pediatrics, RJ1-570, Diseases of the musculoskeletal system, RC925-935
More Details: Abstract Background Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF). Methods Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5. Results Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff’s alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey. Conclusions AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 1546-0096
Relation: https://doaj.org/toc/1546-0096
DOI: 10.1186/s12969-024-01011-0
Access URL: https://doaj.org/article/dd5991ca2dfc471386a38e30914e8cd4
Accession Number: edsdoj.5991ca2dfc471386a38e30914e8cd4
Database: Directory of Open Access Journals
Full text is not displayed to guests.
More Details
ISSN:15460096
DOI:10.1186/s12969-024-01011-0
Published in:Pediatric Rheumatology Online Journal
Language:English