ChatGPT as an effective tool for quality evaluation of radiomics research.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Burak Kocak, Ismail Mese

Ngôn ngữ: eng

Ký hiệu phân loại: 782.292 *Chant

Thông tin xuất bản: Germany : European radiology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 715539

Thêm vào giỏ Liên kết toàn văn

OBJECTIVES: This study aimed to evaluate the effectiveness of ChatGPT-4o in assessing the methodological quality of radiomics research using the radiomics quality score (RQS) compared to human experts. METHODS: Published in European Radiology, European Radiology Experimental, and Insights into Imaging between 2023 and 2024, open-access and peer-reviewed radiomics research articles with creative commons attribution license (CC-BY) were included in this study. Pre-prints from MedRxiv were also included to evaluate potential peer-review bias. Using the RQS, each study was independently assessed twice by ChatGPT-4o and by two radiologists with consensus. RESULTS: In total, 52 open-access and peer-reviewed articles were included in this study. Both ChatGPT-4o evaluation (average of two readings) and human experts had a median RQS of 14.5 (40.3% percentage score) (p >
0.05). Pairwise comparisons revealed no statistically significant difference between the readings of ChatGPT and human experts (corrected p >
0.05). The intraclass correlation coefficient for intra-rater reliability of ChatGPT-4o was 0.905 (95% CI: 0.840-0.944), and those for inter-rater reliability with human experts for each evaluation of ChatGPT-4o were 0.859 (95% CI: 0.756-0.919) and 0.914 (95% CI: 0.855-0.949), corresponding to good to excellent reliability for all. The evaluation by ChatGPT-4o took less time (2.9-3.5 min per article) compared to human experts (13.9 min per article by one reader). Item-wise reliability analysis showed ChatGPT-4o maintained consistently high reliability across almost all RQS items. CONCLUSION: ChatGPT-4o provides reliable and efficient assessments of radiomics research quality. Its evaluations closely align with those of human experts and reduce evaluation time. KEY POINTS: Question Is ChatGPT effective and reliable in evaluating radiomics research quality based on RQS? Findings ChatGPT-4o showed high reliability and efficiency, with evaluations closely matching human experts. It can considerably reduce the time required for radiomics research quality assessment. Clinical relevance ChatGPT-4o offers a quick and reliable automated alternative for evaluating the quality of radiomics research, with the potential to assess radiomics research at a large scale in the future.

Tạo bộ sưu tập với mã QR