A Comparative Evaluation of Computational Models for RNA modification detection using Nanopore sequencing with RNA004 Chemistry.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Mian Umair Ahsan, Joe Chan, Shou-Jiang Gao, Yufei Huang, Wen Meng, Kai Wang, Yongji Zou

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : bioRxiv : the preprint server for biology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 680328

Direct RNA sequencing from Oxford Nanopore Technologies (ONT) has become a valuable method for studying RNA modifications such as N6-methyladenosine (m6A), pseudouridine (ψ), and 5-methylcytosine (m5C). Recent advancements in the RNA004 chemistry substantially reduce sequencing errors compared to previous chemistries (e.g., RNA002), thereby promising enhanced accuracy for epitranscriptomic analysis. In this study, we benchmark the performance of two state-of-the-art RNA modification detection models capable of handling RNA004 data - ONT's Dorado and m6Anet - using two wild-type (WT) cell lines, HEK293T and HeLa, with respective ground truths from GLORI and eTAM-seq, and their paired in vitro transcribed (IVT) RNA as negative controls. We found that under default settings and considering sites with ≥10% modification ratio and ≥10X coverage, Dorado has higher recall (∼0.92) than m6Anet (∼0.51) for m6A detection. Among the overlapping methylated sites between ground truth and computational predictions, there are high correlations of site-specific m6A modification stoichiometry, with correlation coefficient of ∼0.89 for Dorado-truth comparison and ∼0.72 for m6Anet-truth comparison. However, combined assessment of WT and IVT datasets show that while the per-site false positive rate (FPR) can be lower (∼8% for Dorado and ∼33% for m6Anet), both computational tools can have high per-site false discovery rate (FDR) of m6A (∼40% for Dorado and ∼80% for m6Anet) due to the low prevalence of m6A in transcriptome, with a similar trend observed for pseudouridine (∼95% FDR for Dorado). Additional motif analysis reveals that both Dorado and m6Anet exhibit high heterogeneity of false positive calls across sequence contexts, suggesting that sequence contexts help determine accuracy of specific modification calls. There is also a substantial overlap of false positive calls between the two IVT samples, suggesting a post-filtering strategy to improve modification calling by compiling a set of low-confidence sites with a probabilistic model from several IVT samples across diverse cells/tissues. Our analysis highlights key strengths and limitations of the current generation of m6A detection algorithms and offers insights into optimizing thresholds and interpretability. The IVT datasets generated by the RNA004 chemistry provides a publicly available benchmark resource for further development and refinement of computational methods.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH