3.1 Selection of clinical features
First of all, in order to eliminate the errors caused by different dimensions, self-variation, or large differences in the value of each feature variable in the regression analysis, the data is first standardized; then, the 17 texture feature parameters are reduced by Lasso regression. And use the minimum standard to carry out ten cross-validation to select the optimal parameters of the model. Draw the minimum variance and log (λ) relationship curve, and draw a vertical line at the best value point, as shown in Fig. 1. According to Fig. 1(a), the optimal parameter λ in the Lasso model is set to 6; the coefficient profile is drawn according to the log(λ) sequence, as shown in Fig. 1(b), which is the Lasso coefficient profile of 17 characteristic parameters, where, the ordinate in the Lambda graph is the weight coefficient λ curve, the closer the curve is to 0, the higher the feature similarity is. The vertical line is drawn at the value selected using ten cross-validation, where the optimal λ results in six features with non-zero coefficients. In summary, based on the clinical data, six important texture feature parameters are finally obtained in order: WBC, M, NLR, FIB, fPSA, f/tPSA.
3.2 Results of Logistic Regression Equation
The pathological results of PCa were the dependent variable, which was divided into PCa group and non-PCa group. Six parameters were used as covariates for Logistic multivariate analysis. The parameters and their coefficients are shown in Table 1, including the coefficient β, OR value and P value of the variables and constant terms introduced into the model.
P is the probability of occurrence of PCa, with a value range of 0 to 1, and 1−P is the probability of benign lesions. With logit P as the dependent variable and β as the constant term, the regression equation is established: logit P=IN{P/(1−P)}=-0.018-0.010×WBC+2.759×M-0.095×NLR-0.160×FIB-0.306 ×f PSA-2.910×f/tPSA.
3.3 T test and ROC analysis of independent samples
Perform t test according to the Logistic regression equation obtained above, and the corresponding results are shown in Table 2. Among them, the logit P regression equation for the PCa group is -0.537±0.518, and the non-PCa group is -1.100±0.486. The P value between the two groups is less than 0.001, and the difference is statistically significant.
In summary, the ROC curve analysis results based on the logit P classification model are shown in Table 3. The logit P AUC values were 0.816. When the Youden index is the largest, the cut-off value for identifying benign and malignant PCa is -0.784, that is, when logit P is greater than -0.784, this model will be predicted to be PCa, and its sensitivity and specificity are 72.5% and 77.8%, respectively.
3.4 Draw Nomogram graph based on logit P
Based on logit P, draw a Nomogram to predict the risk of prostate benign and malignant nodules, as shown in Fig. 3. The Nomogram chart shows that the total score of the patient's radiomics texture features predicts the risk of benign and malignant prostate nodules. Combined with its calibration curve (shown in Fig. 4), the performance shows a good predictive ability; the X-axis, Y-axis, diagonal dashed line and solid line in the calibration curve respectively represent the meaning: predicted risk of benign and malignant prostate nodules, The probability of the actual diagnosis, the perfect prediction of the ideal model, and the performance of the nomogram. From the calibration curve, it can be seen that the Nomogram of PCa has a good performance, and the nomogram closer to the diagonal dashed line indicates a better prediction.