The study focusses on handling of multiclass imbalanced data on classification of stingless bee samples by employing data balancing techniques, namely Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) approach. These techniques are applied in combination with machine learning (ML) algorithms
specifically Random Forest (RF), and Support Vector Machine (SVM), to assess the models' predictive performance to infer stingless bee samples identities. We studied ML classifier models: RF, RF + SMOTE, RF + ADASYN, SVM, SVM + SMOTE and SVM + ADASYN on the six-class imbalanced dataset of stingless bees morphometrics. Multi-class area under curve (AUC), F1-score, G-mean, balanced accuracy, sensitivity and "No information rate" were used to assess model performance. SMOTE and ADASYN marginally improved the performance of RF and SVM classifiers. SVM outperformed RF, with SVM using SMOTE performing better than with ADASYN. SVM with ADASYN had a lower multi-class AUC (0.9898) and sensitivity (0.956) but a higher F1-score (0.939) compared to SVM with SMOTE (AUC = 0.9918, sensitivity = 0.959, F1-score = 0.934). Overall, SVM with SMOTE was superior to RF with SMOTE. All models except SVM with ADASYN, correctly classified four of the six species,