Identifying somatic driver mutations in cancer with a language model of the human genome.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Shenying Fang, Zhengyang Huang, Guanpeng Li, Xiaohua Liang, Xiaxia Yu, Guangjian Zeng, Chengzhi Zhao, Jinhu Zhuang

Ngôn ngữ: eng

Ký hiệu phân loại: 133.594 Types or schools of astrology originating in or associated with a

Thông tin xuất bản: Netherlands : Computational and structural biotechnology journal , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 212694

Thêm vào giỏ Liên kết toàn văn

Somatic driver mutations play important roles in cancer and must be precisely identified to advance our understanding of tumorigenesis and its promotion and progression. However, identifying somatic driver mutations remains challenging in Homo sapiens genomics due to the random nature of mutations and the high cost of qualitative experiments. Building on the powerful sequence interpretation capabilities of language models, we propose a self-attention-based contextualized pretrained language model for somatic driver mutation identification. We pretrained the model with the Homo sapiens reference genome to equip it with the ability to understand genome sequences and then fine-tuned it for oncogene and tumor suppressor gene prediction tasks, enabling it to extract features related to driver genes from the original genome sequence. The fine-tuned model was used to obtain the mutations' carcinogenic effect characteristics to further identify whether the mutation is a driver or a passenger. Compared with other computational algorithms, our method achieved excellent somatic driver mutation identification performance on the test set, with an absolute improvement of 4.31% in AUROC over the best comparison method. The strong performance of our method indicates that it can provide new insights into the discovery of cancer drivers.

Tạo bộ sưu tập với mã QR