Comparing Large Language Models for antibiotic prescribing in different clinical scenarios: which perform better?

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Michele Bartoletti, Davide Fiore Bavaro, Anna Maria Cattelan, Agnese Colpani, Andrea De Vito, Nicholas Geremia, Justin Laracy, Giordano Madeddu, Alberto Enrico Maraolo, Andrea Marino, Maria Mazzitelli, Cristina Mussini, Giuseppe Nunnari, Saverio Giuseppe Parisi, Antonio Russo, Susan K Seo, Luigi Angelo Vaira

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: England : Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 727295

OBJECTIVES: Large language models (LLMs) show promise in clinical decision-making, but comparative evaluations of their antibiotic prescribing accuracy are limited. This study assesses the performance of various LLMs in recommending antibiotic treatments across diverse clinical scenarios. METHODS: Fourteen LLMs, including standard and premium versions of ChatGPT, Claude, Copilot, Gemini, Le Chat, Grok, Perplexity, and Pi.ai, were evaluated using 60 clinical cases with antibiograms covering ten infection types. A standardised prompt was used for antibiotic recommendations focusing on drug choice, dosage, and treatment duration. Responses were anonymised and reviewed by a blinded expert panel assessing antibiotic appropriateness, dosage correctness, and duration adequacy. RESULTS: A total of 840 responses were collected and analysed. ChatGPT-o1 demonstrated the highest accuracy in antibiotic prescriptions, with 71.7%(43/60) of its recommendations classified as correct and only one (1.7%) incorrect. Gemini and Claude 3 Opus had the lowest accuracy. Dosage correctness was highest for ChatGPT-o1 (96.7%, 58/60), followed by Perplexity Pro (90.0%, 54/60) and Claude 3.5Sonnet (91.7%, 55/60). In treatment duration, Gemini provided the most appropriate recommendations (75.0%, 45/60), while Claude 3.5 Sonnet tended to over-prescribe duration. Performance declined with increasing case complexity, particularly for difficult-to-treat microorganisms. CONCLUSIONS: There is significant variability among LLMs in prescribing appropriate antibiotics, dosages, and treatment durations. ChatGPT-o1 outperformed other models, indicating the potential of advanced LLMs as decision-support tools in antibiotic prescribing. However, decreased accuracy in complex cases and inconsistencies among models highlight the need for careful validation before clinical utilisation.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH