Assessing the accuracy and reproducibility of artificial intelligence-generated medical responses by ChatGPT on Scheuermann's kyphosis


Giray E., Illeez O. G., Korkmaz M. D., Capan N., Saygi E. K., AYDIN A. R.

TURKISH JOURNAL OF PHYSICAL MEDICINE AND REHABILITATION, cilt.71, sa.4, ss.457-464, 2024 (SCI-Expanded, Scopus) identifier identifier identifier

Özet

Objectives: The study aimed to measure the performance and reproducibility of artificial intelligence in answering frequently asked questions about Scheuermann's kyphosis and to compare the artificial intelligence with the SOSORT (International Scientific Society on Scoliosis Orthopaedic and Rehabilitation Treatment) consensus in answering case-based questions. Materials and methods: In this cross-sectional study, 75 questions adapted from frequently asked questions about Scheuermann's kyphosis were queried twice on ChatGPT. Response similarity was assessed to investigate reproducibility. The accuracy of responses was scored based on a scale. Four case studies from the end of the 7th SOSORT consensus paper on the conservative treatment of idiopathic and Scheuermann's kyphosis were presented to ChatGPT. Results: ChatGPT provided correct and comprehensive answers to 43 (57.33%) questions, correct but not comprehensive answers to 29 (38.67%) questions, and partially incorrect answers to 3 (4%) questions. ChatGPT performed best in the quality-of-life category, with 18/19 (94.73%) correct scores (score of 1). ChatGPT performed worst in the diagnosis category, with 3/8 (37.5%) correct and comprehensive answers, and in the treatment and follow-up category, with 9/24 (37.5%) correct and comprehensive answers. ChatGPT provided reproducible answers to 92% of the questions. ChatGPT's responses to the treatment of all four case studies were incorrect. Conclusion: While ChatGPT can provide valuable general information regarding Scheuermann's kyphosis, its ability to offer accurate treatment-related advice is limited.