MATERIALS AND METHODS: We processed records from 157,493 University of California San Diego Health patients seen between January 01, 2016-July 03, 2019 with at least 6 months of medication history, excluding pregnant women, patients under 18, and prisoners. Three models (Logistic Regression, Random Forest, and Ensemble) were constructed using hyper-parameters selected through 10-fold cross-validation. Model performance was measured by the Area Under the Receiver Operating Characteristic Curve (AUROC). The model coefficients' odds ratios and p-values were calculated for the Logistic Regression model, as were Gini indices for Random Forest. Decision boundary analysis was conducted using pair-wise false positive and false negative cases each model would predict at a specific threshold. RESULTS: Logistic Regression, Random Forest, and Ensemble models yielded test AUROCs of 0.839, 0.851, and 0.866, respectively. Significant covariates that may affect risk include age, immuno-compromised treatments, past antibiotic uses, and some medications for the gastrointestinal tract. CONCLUSIONS: The models achieve high discrimination performance (AUROC >
0.83). There is a general consensus among different analysis approaches regarding predictors that impact patients' chances of having a positive test, which may influence