Deep Learning for Automated Extraction of Primary Sites from Cancer Pathology Reports [electronic resource]

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Oak Ridge National Laboratory, United States, United States. Dept. of Energy. Office of Science

Ngôn ngữ: eng

Ký hiệu phân loại: 576.8 Evolution

Thông tin xuất bản: Washington, D.C. : Oak Ridge, Tenn. : United States. Dept. of Energy. Office of Science ; Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy, 2017

Mô tả vật lý: Size: p. 244-251 : , digital, PDF file.

Bộ sưu tập: Báo, Tạp chí

ID: 261013

Thêm vào giỏ Liên kết toàn văn

Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. Here in this study we investigated deep learning and a convolutional neural network (CNN), for extracting ICDO- 3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro and macro-F score increases of up to 0.132 and 0.226 respectively when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on CNN method and cancer site. Finally, these encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.

1. 59 basic biological sciences
2. 60 applied life sciences
3. Convolutional neural network
4. Deep learning
5. Information extraction
6. Natural language processing
7. Pathology reports
8. Primary cancer site

Tạo bộ sưu tập với mã QR