Aligning large language models with radiologists by reinforcement learning from AI feedback for chest CT reports.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Jun Qi, Shan Shi, Qinghua Su, Li Sun, Lingrui Yang, Xuedong Yang, Xiantong Zhen, Yuxing Zhou

Ngôn ngữ: eng

Ký hiệu phân loại: 340 Law

Thông tin xuất bản: Ireland : European journal of radiology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 114142

Thêm vào giỏ Liên kết toàn văn

BACKGROUND: Large language models (LLMs) often struggle to fully capture the nuanced preferences and clinical judgement of radiologists in medical report summarization even when fine-tuned on massive medical reports. This could lead to the generated radiology reports lacking the professionalism and sufficient quality required in critical diagnostic decision-making. OBJECTIVE: To investigate the alignment of LLMs with radiologists by reinforcement learning from AI feedback (RLAIF) for high-quality, clinically significant impression summarization in Chest CT reports. METHODS: From the data set of our hospital, 94,844 chest CT reports (median age, 56 years [IQR, 46-69 years]
29,350 male and 34,036 female patients) were included in this retrospective study. We created a comparative data set of chest CT reports annotated by radiologists, in which the impressions of the reports generated by fine-tuned LLMs and radiologists were further reviewed and ranked by Tongyi Qianwen (Qwen) and a radiologist. Using the comparative data set, we performed further fine-tuning by reinforcement learning to align LLMs with radiologists. The performance of the aligned LLMs was evaluated on a test set in terms of metrics including ROUGE, BLEU, Precision, Recall and F1 score. RESULTS: The results showed that an average agreement rate of the ranking between the radiologist and Qwen is up to 77.9 %. The aligned LLM achieved consistent performance improvement over its unaligned counterpart by 2.56 %, 1.77 % and 1.13 % in terms of Precision, Recall and F1 score, respectively (P <
0.001). CONCLUSION: In this study, the results demonstrated the effect of the explicit alignment of LLMs with radiologists using RLAIF in generating impressions of chest CT reports. Additionally, the study indicated the possibility that AI can serve as an alternative to humans in providing feedback for aligning LLMs with radiologists. CLINICAL IMPACT: This study confirms the benefit of reinforcement learning from AI feedback for impression summarization, paving the way for the application of AI feedback in clinical CT report summarization. It will serve as the basis for the future development of multi-model large language models in the field of radiology.

Tạo bộ sưu tập với mã QR