ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Joshua Calhoun, Dominick J Casciato

Ngôn ngữ: eng

Ký hiệu phân loại: 782.292 *Chant

Thông tin xuất bản: United States : Foot & ankle specialist , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 548933

Thêm vào giỏ Liên kết toàn văn

INTRODUCTION: As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery. METHODS: Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS
2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed. RESULTS: The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics. CONCLUSION: ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence. LEVEL OF EVIDENCE: Level IV: Retrospective cohort study.

Tạo bộ sưu tập với mã QR