Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Xuebin Wang, Xueqi Wang, Mei Yang, Haiyan Ye, Sumian Zhang

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : Journal of medical systems , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 57680

 BACKGROUND: Generative large language models (LLMs) are increasingly integrated into the medical field. However, their actual efficacy in clinical decision-making remains partially unexplored. This study aimed to assess the performance of the three LLMs, ChatGPT-4, Gemini, and Med-Go, in the domain of professional medicine when confronted with actual clinical cases. METHODS: This study involved 134 clinical cases spanning nine medical disciplines. Each LLM was required to provide suggestions for diagnosis, diagnostic criteria, differential diagnosis, examination and treatment for every case. Responses were scored by two experts using a predefined rubric. RESULTS: In overall performance among the models, Med-Go achieved the highest median score (37.5, IQR 31.9-41.5), while Gemini recorded the lowest (33.0, IQR 25.5-36.6), showing significant statistical difference among the three LLMs (p <
  0.001). Analysis revealed that responses related to differential diagnosis were the weakest, while those pertaining to treatment recommendations were the strongest. Med-Go displayed notable performance advantages in gastroenterology, nephrology, and neurology. CONCLUSIONS: The findings show that all three LLMs achieved over 60% of the maximum possible score, indicating their potential applicability in clinical practice. However, inaccuracies that could lead to adverse decisions underscore the need for caution in their application. Med-Go's superior performance highlights the benefits of incorporating specialized medical knowledge into LLMs training. It is anticipated that further development and refinement of medical LLMs will enhance their precision and safety in clinical use.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH