E-ISSN: 1019-5157 ISSN: 2651-5024
Research

Can Artificial Intelligence Evaluate Clinical Practice Guidelines Like Human Experts? A Comparative Study Using the AGREE II Instrument

Mehmet Melih Karaaslan , Pelin Kuzucu , Burak Karaaslan , Tolga Turkmen , Seyma Tastemur , Nadira Zahirovic , Alp Ozgun Borcek , Mesut Emre Yaman
Article in Press

Abstract

Aim
Clinical practice guidelines (CPGs) are widely used in neurosurgery to support evidence-based clinical decisions and to promote consistency in patient management. However, their methodological quality and internal consistency vary substantially across publications. In the present study, we evaluated CPGs addressing brain metastases and, for the first time, compared guideline assessments performed by neurosurgical experts with those generated by artificial intelligence (AI) models using the AGREE II instrument.

Material and Methods
A systematic literature search identified five CPGs addressing the use of stereotactic radiosurgery for brain metastases. Each guideline was independently assessed by four neurosurgical experts as well as by two artificial intelligence models (ChatGPT-4.0 and DeepSeek R1) using the AGREE II framework. Domain scores were expressed as percentages, and interrater reliability was examined with the intraclass correlation coefficient (ICC).

Results
The scoring patterns obtained from human reviewers and AI models were largely comparable. The highest ratings were recorded in the domains of “Scope and Purpose” and “Clarity of Presentation,” while “Applicability” consistently received the lowest scores. Statistical analysis revealed no significant differences between the assessments of the human experts and the AI models (p > 0.05). Interrater agreement ranged from moderate to excellent (ICC 0.491–0.908). In addition, AI models were less inclined to assign extreme scores, indicating a more conservative evaluation tendency.

Conclusion
AI-based evaluations demonstrated a level of performance comparable to that of human experts. These results indicate that AI could function as a supportive tool in the appraisal of clinical guidelines and may also have broader applications in clinical decision support and related medical tasks. Incorporating AI into guideline development processes may help improve efficiency, promote greater consistency, and enhance transparency within neurosurgical practice.

Keywords

Artificial Intelligence Clinical Practice Guidelines as Topic Brain Neoplasms/secondary ChatGPT AGREE II