Efficient storage and regression computation for population-scale genome sequencing studies.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Christopher Chang, Manuel A Rivas

Ngôn ngữ: eng

Ký hiệu phân loại: 636.0885 Animal husbandry

Thông tin xuất bản: England : Bioinformatics (Oxford, England) , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 179134

MOTIVATION: The growing availability of large-scale population biobanks has the potential to significantly advance our understanding of human health and disease. However, the massive computational and storage demands of whole genome sequencing (WGS) data pose serious challenges, particularly for underfunded institutions or researchers in developing countries. This disparity in resources can limit equitable access to cutting-edge genetic research. RESULTS: We present novel algorithms and regression methods that dramatically reduce both computation time and storage requirements for WGS studies, with particular attention to rare variant representation. By integrating these approaches into PLINK 2.0, we demonstrate substantial gains in efficiency without compromising analytical accuracy. In an exome-wide association analysis of 19.4 million variants for the body mass index phenotype in 125,077 individuals (AllofUs project data), we reduced runtime from 695.35 minutes (11.5 hours) on a single machine to 1.57 minutes with 30 GB of memory and 50 threads (or 8.67 minutes with 4 threads). Additionally, the framework supports multi-phenotype analyses, further enhancing its flexibility. AVAILABILITY: Our optimized methods are fully integrated into PLINK 2.0 and can be accessed at: https://www.cog-genomics.org/plink/2.0/.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH