Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Elizabeth Lewis, Tushar Mandloi, Helen Parkinson, Zinaida Perova, Guergana Savova, Jiarui Yao

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : bioRxiv : the preprint server for biology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 643083

Thêm vào giỏ Liên kết toàn văn

Patient-derived cancer models (PDCMs) have emerged as indispensable tools in both cancer research and preclinical studies. The number of publications on PDCMs increased significantly in the last decade. Developments in Artificial Intelligence (AI), particularly Large Language Models (LLMs), hold promise for extracting knowledge from scientific texts at scale. This study investigates the use of LLM-based systems for automatically extracting PDCM-related entities from scientific texts. We evaluated two approaches: direct prompting and soft prompting using LLMs. For direct prompting, we manually create prompts to guide the LLMs to output PDCM-related entities from texts. The prompt consists of an instruction, definitions of entity types, gold examples and a query. We automatically train soft prompts - a novel line of research in this domain - as continuous vectors using machine learning approaches. Our experiments utilized state-of-the-art LLMs - proprietary GPT4-o and a series of open LLaMA3 family models. In our experiments, GPT4-o with direct prompts maintained competitive results. Our results demonstrate that soft prompting can effectively enhance the capabilities of smaller open LLMs, achieving results comparable to proprietary models. These findings highlight the potential of LLMs in domain-specific text extraction tasks and emphasize the importance of tailoring approaches to the task and model characteristics.

Tạo bộ sưu tập với mã QR