Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Thomas Deramus, Kueiyu Joshua Lin, Kerry Ngan, Joseph M Plasek, Sebastian Schneeweiss, Theodore N Tsacogianis, Janick G Weberpals, Richard Wyss, Jie Yang, Li Zhou

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : medRxiv : the preprint server for health sciences , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 738337

 BACKGROUND: To improve confounding control in healthcare database studies, data-driven algorithms may empirically identify and adjust for large numbers of pre-exposure variables that indirectly capture information on unmeasured confounding factors ('proxy' confounders). Current approaches for high-dimensional proxy adjustment do not leverage free-text notes from EHRs. Unsupervised natural language processing (NLP) technology can scale to generate large numbers of structured features from unstructured notes. OBJECTIVE: To assess the impact of supplementing claims data analyses with large numbers of NLP generated features for high-dimensional proxy adjustment. METHODS: We linked Medicare claims with EHR data to generate three cohorts comparing different classes of medications on the 6-month risk of cardiovascular outcomes. We used various NLP methods to generate structured features from free-text EHR notes and used LASSO regression to fit several PS models that included different covariate sets as candidate predictors. Covariate sets included features generated from claims data only, and claims data plus NLP-generated EHR features. RESULTS: Including both claims codes and NLP-generated EHR features as candidate predictors improved overall covariate balance with standardized differences being <
 0.1 for all variables. While overall balance improved, the impact on estimated treatment effects was more nuanced with adjustment for NLP-generated features moving effect estimates further in the expected direction in two of the empirical studies but had no impact on the third study. CONCLUSION: Supplementing administrative claims with large numbers of NLP-generated features for ultra-high-dimensional proxy confounder adjustment improved overall covariate balance and may provide a modest benefit in terms of capturing confounder information.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH