BACKGROUND: Postoperative pulmonary complications (PPCs) are major adverse events in neurosurgical patients. This study aimed to develop and validate machine learning models predicting PPCs after neurosurgery. METHODS: PPCs were defined according to the European Perioperative Clinical Outcome standards as occurring within 7 postoperative days. Data of cases meeting inclusion/exclusion criteria were extracted from the anesthesia information management system to create three datasets: The development (data of Huashan Hospital, Fudan University from 2018 to 2020), temporal validation (data of Huashan Hospital, Fudan University in 2021) and external validation (data of other three hospitals in 2023) datasets. Machine learning models of six algorithms were trained using either 35 retrievable and plausible features or the 11 features selected by Lasso regression. Temporal validation was conducted for all models and the 11-feature models were also externally validated. Independent risk factors were identified and feature importance in top models was analyzed. RESULTS: PPCs occurred in 712 of 7533 (9.5%), 258 of 2824 (9.1%), and 207 of 2300 (9.0%) patients in the development, temporal validation and external validation datasets, respectively. During cross-validation training, all models except Bayes demonstrated good discrimination with an area under the receiver operating characteristic curve (AUC) of 0.84. In temporal validation of full-feature models, deep neural network (DNN) performed the best with an AUC of 0.835 (95% confidence interval [CI]: 0.805-0.858) and a Brier score of 0.069, followed by logistic regression (LR), random forest and XGBoost. The 11-feature models performed comparable to full-feature models with very close but statistically lower AUCs, with the top models of DNN and LR in temporal and external validations. An 11-feature nomogram was drawn based on the LR algorithm and it outperformed the minimally modified Assess respiratory RIsk in Surgical patients in CATalonia (ARISCAT) and Laparoscopic Surgery Video Educational Guidelines (LAS VEGAS) scores with a higher AUC (LR: 0.824, ARISCAT: 0.672, LAS: 0.663). Independent risk factors based on multivariate LR mostly overlapped with Lasso-selected features, but lacked consistency with the important features using the Shapley additive explanation (SHAP) method of the LR model. CONCLUSIONS: The developed models, especially the DNN model and the nomogram, had good discrimination and calibration, and could be used for predicting PPCs in neurosurgical patients. The establishment of machine learning models and the ascertainment of risk factors might assist clinical decision support for improving surgical outcomes. TRIAL REGISTRATION: ChiCTR 2100047474
  https://www.chictr.org.cn/showproj.html?proj = 128279.