3.1 Screening for differentially expressed lncRNA and mRNA
In this experiment, the mRNA expression profiles and lncRNA profiles of 1096 breast cancer patients and 112 normal control patients were compared, and 2100 differentially expressed lncRNAs (DEL) were identified (Figure 1), with a logFC > 1 and a p value < 0.05.
3.2 Internal validation set refers to the clinical characteristics of the primary dataset and the full dataset
We excluded 43 patients with a survival time of ≤0 days or an unknown staging, and among the remaining 1053 BC patients, 524 were randomly assigned to the "primary dataset" and the entire group of 1053 patients was selected as the "entire dataset". There was no significant difference in clinical characteristics between the two datasets (p >0.05), as shown in Table 1.
3.3 Selection of lncRNAs associated with overall survival in BC patients
9 lncRNAs were identified as being associated with OS prognosis through univariate and multivariate Cox analyses. AC068858.1, AC000067.1, LINC00460, and LINC02408 had hazard ratios (HRs) greater than 1, indicating that their overexpression was associated with shorter OS. On the other hand, AC136475.5, AC023043.4, AC073359.1, AC244502.1, and COL4A2-AS1 had HRs less than 1, suggesting that their overexpression may have a better prognosis for OS (Table 2).
3.4 Construction and validation of a risk scoring system based on a 9-lncRNA signature
The risk score was calculated based on the expression levels of 9 different lncRNAs using the following formula: Risk score = (3.664232 × ExpressionAC068858.1) + (-3.37774 ×ExpressionAC136475.5) + (2.580171 × ExpressionAC000067.1) + (-0.43782 × ExpressionAC023043.4) + (0.180252 × ExpressionLINC00460)+ (-16.3049 × ExpressionAC073359.1) + (-2.437 × ExpressionAC244502.1) + (-7.68153 × ExpressionCOL4A2-AS1) + (1.873982 × ExpressionLINC02408). Based on the median risk score, patients were divided into high/low-risk groups. The distribution of OS status and expression profiles of the 9 lncRNAs in the high/low-risk groups are shown in Figure 2A, with higher expression levels of the 9 lncRNAs in the high-risk group. Kaplan-Meier survival analysis clearly showed that the prognosis of the high-risk group was significantly worse than that of the low-risk group (p <0.001) (Figure 2B).
We then constructed a time-dependent ROC curve for the 9 lncRNA signature, which showed an area under the curve (AUC) of 0.86 at 2 years, remaining at 0.86 at 4 years, increasing to 0.88 at 6 years, and reaching 0.92 at 8 years (Figure 2C). In the entire dataset, the expression levels of the 9 lncRNAs were higher in the high-risk group (Figure 3A), and the prognosis of the high-risk group was also worse, with an OS curve showing a p-value of<0.001 (Figure 3B). The ROC curve also showed that the 9 lncRNA signature had good predictive performance for the survival of BC patients (Figure 3C), with an AUC of 0.72 at 2 years, 0.73 at 4 years, 0.74 at 6 years, and 0.76 at 8 years. Therefore, the 9-lncRNA signature had good predictive performance for BC patients in both the primary dataset and the entire dataset.
3.5 The prognostic value of the 9-lncRNA signature is independent of conventional clinical risk factors
Next, we tested whether the prognostic performance of the 9-lncRNA signature was independent of those conventional clinical risk factors. In the primary dataset, binary logistic regression of the main dataset suggested that age, stage, T, N, and M were prognostic factors, but multivariate COX risk analysis indicated that age and the lncRNA model were independent prognostic factors, with HRs of 1.059 (p <0.001) and 1.035 (p<0.001), respectively (Table 3). Similarly, in the entire dataset, Binary logistic regression analysis suggested that age, stage, T, N, and M were prognostic factors, but multivariate COX risk analysis indicated that age and the lncRNA model were independent prognostic factors, with HRs of 1.035 (p <0.001) and 1.004 (p <0.001), respectively (Table 4). When stratified by age, as shown by the Kaplan-Meier curve, the high-risk and low-risk groups of the 9-lncRNA signature exhibited significant differences (p <0.05) in both the subgroup aged <60 years and the subgroup aged ≥60 years (Figure 4), indicating that the 9-lncRNA signature can independently predict the prognosis of BC patients.
3.6 Create a nomogram of 9-lncRNA markers and clinical risk factors
According to COX multivariate analysis, clinical risk factors (age) are important predictor of OS in breast cancer patients. Therefore, we integrated age with 9-lncRNA to develop an effective quantitative method (nomogram) for predicting OS. We then used this nomogram to predict the 1-year, 3-year, and 5-year survival of BC patients (Figure 5A). Subsequently, the consistency index (C-index) and calibration curve were used to evaluate the identification and calibration ability of the prognostic nomogram. The results showed that the probability determined by the nomogram was very close to the actual probability (Figure 5B, 5C, 5D, 5E). The C-index of the main dataset was 0.81.