ChatGPT vs. Gemini: Comparative accuracy and efficiency in Lung-RADS score assignment from radiology reports.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Dhiraj Baruah, Jeremy R Burt, Jordan H Chamberlin, Mohamed Hamouda, Ismail M Kabakus, James Munford, Matthew Silbergleit, Ria Singh, Adrienn Tóth

Ngôn ngữ: eng

Ký hiệu phân loại: 149.72 Agnosticism

Thông tin xuất bản: United States : Clinical imaging , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 718348

OBJECTIVE: To evaluate the accuracy of large language models (LLMs) in generating Lung-RADS scores based on lung cancer screening low-dose computed tomography radiology reports. MATERIAL AND METHODS: A retrospective cross-sectional analysis was performed on 242 consecutive LDCT radiology reports generated by cardiothoracic fellowship-trained radiologists at a tertiary center. LLMs evaluated included ChatGPT-3.5, ChatGPT-4o, Google Gemini, and Google Gemini Advanced. Each LLM was used to assign Lung-RADS scores based on the findings section of each report. No domain-specific fine-tuning was applied. Accuracy was determined by comparing the LLM-assigned scores to radiologist-assigned scores. Efficiency was assessed by measuring response times for each LLM. RESULTS: ChatGPT-4o achieved the highest accuracy (83.6 %) in assigning Lung-RADS scores compared to other models, with ChatGPT-3.5 reaching 70.1 %. Gemini and Gemini Advanced had similar accuracy (70.9 % and 65.1 %, respectively). ChatGPT-3.5 had the fastest response time (median 4 s), while ChatGPT-4o was slower (median 10 s). Higher Lung-RADS categories were associated with marginally longer completion times. ChatGPT-4o demonstrated the greatest agreement with radiologists (κ = 0.836), although it was less than the previously reported human interobserver agreement. CONCLUSION: ChatGPT-4o outperformed ChatGPT-3.5, Gemini, and Gemini Advanced in Lung-RADS score assignment accuracy but did not reach the level of human experts. Despite promising results, further work is needed to integrate domain-specific training and ensure LLM reliability for clinical decision-making in lung cancer screening.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH