VirNucPro: an identifier for the identification of viral short sequences using six-frame translation and large language models.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Jingyang Gao, Jing Li, Wei Lin, Jia Mi, Fengjuan Tian, Yigang Tong, Jing Wan

Ngôn ngữ: eng

Ký hiệu phân loại: 929.10284 Genealogy

Thông tin xuất bản: England : Briefings in bioinformatics , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 749968

 Viruses are ubiquitous in nature, yet our understanding of them remains limited. High-throughput sequencing technology facilitates the unbiased revelation of genetic composition in samples
  however, viral sequences typically make up a small proportion of the entire sequencing data, making it challenging to accurately identify the few or fragmented viral sequences present in a sample. The limited features and information provided by short sequences result in insufficient resolution of viral sequences by existing models. Therefore, we propose a new model, VirNucPro, for short viral sequence identification. Based on a six-frame translation strategy and large language models, we combine nucleotide and amino acid sequence information to enhance feature extraction for short sequences, achieving high accuracy in identifying short viral sequences. Ablation experiments compared the contributions of nucleotide and amino acid sequence features to the model, confirming that the introduced amino acid features significantly contribute to the classification results. Our model outperforms others, such as GCNFrame, DeepVirFinder, DETIRE, and Virtifier, which have demonstrated good performance in identifying short viral sequences of 300 and 500 bp. Our model demonstrates excellent performance on carefully created real-world datasets. Additionally, it can scan for prophage regions within long bacterial fragments, offering a wide range of applications. The codes are available at: https://github.com/Li-Jing-1997/VirNucPro.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH