PURPOSE: There is a global increase of cardiovascular disease and diabetes (Cardio-Metabolic diseases: CMD). Suffering from depression or anxiety disorders increases the probability of developing CMD. In this study we tested a wide array of predictors for the onset of CMD with Machine Learning (ML), evaluating whether adding detailed psychiatric or biological variables increases predictive performance. METHODS: We analysed data from the Netherlands Study of Depression and Anxiety, a longitudinal cohort study (N = 2071), using 368 predictors covering 4 domains (demographic, lifestyle & somatic, psychiatric, and biological markers). CMD onset (24% incidence) over a 9-year follow-up was defined using self-reported stroke, heart disease, diabetes with high fasting glucose levels and (antithrombotic, cardiovascular, or diabetes) medication use (ATC codes C01DA, C01-C05A-B, C07-C09A-B, C01DB, B01, A10A-X). Using different ML methods (Logistic regression, Support vector machine, Random forest, and XGBoost) we tested the predictive performance of single domains and domain combinations. RESULTS: The classifiers performed similarly, therefore the simplest classifier (Logistic regression) was selected. The Area Under the Receiver Operator Characteristic Curve (AUC-ROC) achieved by singe domains ranged from 0.569 to 0.649. The combination of demographics, lifestyle & somatic indicators and psychiatric variables performed best (AUC-ROC = 0.669), but did not significantly outperform demographics. Age and hypertension contributed most to prediction
detailed psychiatric variables added relatively little. CONCLUSION: In this longitudinal study, ML classifiers were not able to accurately predict 9-year CMD onset in a sample enriched of subjects with psychopathology. Detailed psychiatric/biological information did not substantially increase predictive performance.