BACKGROUND AND AIM: Prostate cancer is the most common male malignancy. Current diagnostic methods using single TPSA and PHI lack specificity. Some researches have created nomograms for predicting risk, but these are not easily visualized. Our study aims to find the best negative predictive value (NPV) for PHI, then build a clustering model to display prostate cancer risk categories, particularly useful for patients with PSA >
20 and be actually applied in clinical work. METHOD: We collected 708 patients in the training cohort and 143 in the validation cohort, divided into three groups based on their PSA levels. Next, we determined optimal and customized PHI cut-off values, calculated NPV and PPV, and selected logistic regression as the best method among several machine-learning algorithms. Subsequently, the significant variables were identified, and then a clustering algorithm was constructed. Finally, the model was validated and made available online for further clinical application. RESULTS: The Optimal PHI cut-off lower limits for PSA >
4, PSA4-20, PSA >
20 subgroups were 23.85, 24.35, and 40.75, with upper limits of 142.9, 143, and 135.6, respectively. The clustering model of the optimal cohort for PSA >
4 and PSA 4-20 sub-groups showed a superior Silhouette coefficients of 0.433 and 0.526 than that of the customized PHI cohort (0.432, 0.452). The PSA >
20 subgroup owned the highest Silhouette coefficient of 0.572. The validation cohort showed AUC values of 0.761, 0.823, 0.833 for these 3 sub-groups, with accuracy rates of 88.81%, 90.38%, and 82.05%. CONCLUSION: In conclusion, our clustering model effectively categorizes patients into distinct risk groups with clear visualization and has demonstrated stability and reliability in the validation cohort, potentially aiding in early diagnosis of prostate cancer in clinical practice.