NEAR: Neural Embeddings for Amino acid Relationships.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Thomas Colligan, Daphne Demekas, Daniel Olson, Jack W Roddy, Travis J Wheeler, Ken Youens-Clark

Ngôn ngữ: eng

Ký hiệu phân loại: 920.71 Men

Thông tin xuất bản: United States : bioRxiv : the preprint server for biology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 218042

Protein language models (PLMs) have recently demonstrated potential to supplant classical protein database search methods based on sequence alignment, but are slower than common alignment-based tools and appear to be prone to a high rate of false labeling. Here, we present NEAR, a method based on neural representation learning that is designed to improve both speed and accuracy of search for likely homologs in a large protein sequence database. NEAR's ResNet embedding model is trained using contrastive learning guided by trusted sequence alignments. It computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of residue-level k-NN search and a simple neighbor aggregation scheme. Tests on a benchmark consisting of trusted remote homologs and randomly shuffled decoy sequences reveal that NEAR substantially improves accuracy relative to state-of-the-art PLMs, with lower memory requirements and faster embedding and search speed. While these results suggest that the NEAR model may be useful for standalone homology detection with increased sensitivity over standard alignment-based methods, in this manuscript we focus on a more straightforward analysis of the model's value as a high-speed pre-filter for sensitive annotation. In that context, NEAR is at least 5x faster than the pre-filter currently used in the widely-used profile hidden Markov model (pHMM) search tool HMMER3, and also outperforms the pre-filter used in our fast pHMM tool, nail.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH