Comparing AAOS appropriate use criteria with ChatGPT-4o recommendations on treating distal radius fractures.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Akiro Duey, Suhas Etigunta, Michael R Hausman, James Hong, Jamie Kator, Kareem S Mohamed, Hannah S Rhee, Christoph A Schroen, Alexander Yu, Ryan Yu

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: France : Hand surgery & rehabilitation , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 723351

 INTRODUCTION: The American Academy of Orthopaedic Surgeons (AAOS) developed appropriate use criteria (AUC) to guide treatment decisions for distal radius fractures based on expert consensus. This study aims to evaluate the accuracy of Chat Generative Pre-trained Transformer-4o (ChatGPT-4o) by comparing its appropriateness scores for distal radius fracture treatment with those from the AUC. METHODS: The AUC patient scenarios were categorized by factors such as fracture type (AO/OTA classification), mechanism of injury, pre-injury activity level, patient health (ASA 1-4), and associated injuries. Treatment options included percutaneous pinning, spanning external fixation, volar locking plates, dorsal plates, and immobilization methods, among others. Orthopedic surgeons assigned appropriateness scores for each treatment (1-3 = "Rarely Appropriate," 4-6 = "May Be Appropriate," and 7-9 = "Appropriate"). ChatGPT-4o was prompted with the same patient scenarios and asked to assign scores. Differences between AAOS and ChatGPT-4o ratings were used to calculate mean error, mean absolute error, and mean squared error. Statistical significance was assessed using Spearman correlation, and appropriateness scores were grouped into categories to determine percentage overlap between the two sources. RESULTS: A total of 240 patient scenarios and 2160 paired treatment scores were analyzed. The mean error for treatment options ranged from 0.6 for volar locking plate to -2.9 for dorsal plating. Pearson correlation revealed significant positive associations for dorsal spanning bridge (0.43, P = <
 0.001) and spanning external fixation (0.4, P = <
 0.001). The percentage overlap between AAOS and ChatGPT-4o in the appropriateness categories varied, with 99.17% agreement for immobilization without reduction, 90.42% for volar locking plates, and only 15% for dorsal plating. CONCLUSION: ChatGPT-4o does not consistently align with the appropriate use criteria in determining appropriate management of distal radius fractures. While there was moderate concordance in certain treatments, ChatGPT-4o tended to favor more conservative approaches, raising concerns about the reliability of AI-generated recommendations for medical advice and clinical decision-making.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH