Advancing bioinformatics with large language models: components, applications and perspectives.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Kang Li, Jiajia Liu, Tiangang Wang, Haixia Xu, Mengyuan Yang, Yankai Yu, Xiaobo Zhou

Ngôn ngữ: eng

Ký hiệu phân loại: 691.99 Adhesives and sealants

Thông tin xuất bản: United States : ArXiv , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 699519

Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics, spanning genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, the core attention mechanism, and the pre-training processes underlying these models. Additionally, we will introduce currently available foundation models and highlight their downstream applications across various bioinformatics domains. Finally, drawing from our experience, we will offer practical guidance for both LLM users and developers, emphasizing strategies to optimize their use and foster further innovation in the field.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH