How to leverage large language models for automatic ICD coding.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Sewon Kim, Youngju Yoo

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : Computers in biology and medicine , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 713684

Thêm vào giỏ Liên kết toàn văn

ICD coding, which involves assigning appropriate ICD codes to clinical notes, is essential for healthcare tasks such as health expense claims, insurance claims, and disease research. Manual ICD coding is time-consuming and prone to errors, increasing the need for automation. However, clinical notes often contain non-grammatical expressions, abbreviations, professional terms, and synonyms, making them notably noisy compared to general documents. Additionally, ICD coding faces challenges such as a broad label space and the long-tail problem, making automatic ICD coding highly challenging. Large Language Models (LLMs) have shown great potential in code extraction tasks due to their exceptional natural language understanding and information extraction capabilities. However, the unique characteristics of clinical records and ICD codes necessitate fine-tuning LLMs for optimal performance in ICD coding. In this study, we propose a novel fine-tuning framework for LLMs aimed at automatic ICD coding. Our framework introduces additional elements, including a label attention mechanism, note-relevant knowledge injection based on medical expressions, and knowledge-driven sampling to address the input token limitations of LLMs. Experiments on the MIMIC-III-50 dataset show that our framework outperforms vanilla fine-tuning in both micro and macro accuracy and F1 scores, with particularly significant improvements observed in encoder-decoder models.

Tạo bộ sưu tập với mã QR