OBJECTIVE: Persona validation is a challenging task, often relying on costly external validation methods. The aim of this study was the development of a novel method for Personas validation based on data already available during their creation. METHODS: A novel approach based on self-supervised machine learning (SSML) was proposed. A training-test split was performed (80 %-20 %), with the training set used for Personas development. The obtained labels were used as input for a 5-fold cross-validation grid search, resulting in 5 optimal different models. The "weak" ground truth for the test set was determined using the trained clustering model, and was compared with the prediction obtained by the majority voting of the optimal models. Performance evaluation was conducted by means of weighted accuracy, precision, recall and F1 score. RESULTS: The proposed method was evaluated on two very different healthcare datasets composed by questionnaires. The former was presented 1070 subjects, resulting in three unbalanced Personas (P0 n = 100
P1 n = 292
P2 n = 464). The latter included 176 subjects with three slightly unbalanced Personas. (P0 n = 58
P1 n = 32
P2 n = 50). The SSML approach resulted capable of correctly differentiating the clusters with high values of weighted accuracy (88.27 % and 94.12 %), precision (87.11 % and 92.83 %), recall (86.92 % and 91.67 %), and F1 score (86.92 % and 91.76 %). CONCLUSIONS: The proposed method showed high capabilities in generalization beyond the training data, validating the Personas' capability of stratifying the characteristics of target populations. Additionally, this method significantly reduced the costs to validate Personas when compared to other methods in current literature.