PURPOSE: To develop models using different machine learning algorithms to predict high-risk symptom burden clusters in breast cancer patients undergoing chemotherapy, and to determine an optimal model. METHODS: Data from 647 breast cancer patients were analyzed to develop a model predicting high-risk symptom burden clusters. Five machine learning algorithms, including an artificial neural network (ANN), a decision tree (DT), a support vector machine (SVM), a random forest (RF), and extreme gradient boosting (XGBoost), were tested, as was traditional logistic regression. Performance was evaluated by deriving the predictive accuracy, precision, discriminatory capacity, calibration, and clinical utility, and an optimal model was identified. RESULTS: A model based on the RF algorithm exhibited better accuracy, precision, and discriminatory capacity than the other models. The area under the receiver operator curve was 0.91, the sensitivity was 65.8%, the specificity was 93.5%, the positive predictive value was 98.02%, and the false positive rate was only 0.91%. CONCLUSION: The model created using the RF algorithm was excellent in terms of predictive accuracy and precision, and can be used for early identification of the risk of self-reported symptom burden clusters in breast cancer patients undergoing chemotherapy.