Small, Open-Source Text-Embedding Models as Substitutes to OpenAI Models for Gene Analysis.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Dailin Gan, Jun Li

Ngôn ngữ: eng

Ký hiệu phân loại: 691.99 Adhesives and sealants

Thông tin xuất bản: United States : bioRxiv : the preprint server for biology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 692475

While foundation transformer-based models developed for gene expression data analysis can be costly to train and operate, a recent approach known as GenePT offers a low-cost and highly efficient alternative. GenePT utilizes OpenAI's text-embedding function to encode background information, which is in textual form, about genes. However, the closed-source, online nature of OpenAI's text-embedding service raises concerns regarding data privacy, among other issues. In this paper, we explore the possibility of replacing OpenAI's models with open-source transformer-based text-embedding models. We identified ten models from Hugging Face that are small in size, easy to install, and light in computation. Across all four gene classification tasks we considered, some of these models have outperformed OpenAI's, demonstrating their potential as viable, or even superior, alternatives. Additionally, we find that fine-tuning these models often does not lead to significant improvements in performance.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH