A novel model with nutrition-related parameters for predicting overall survival of cancer patients

Increasing evidence indicates that nutritional status could influence the survival of cancer patients. This study aims to develop and validate a nomogram with nutrition-related parameters for predicting the overall survival of cancer patients. A total of 8749 patients from the multicentre cohort study in China were included as the primary cohort to develop the nomogram, and 696 of these patients were recruited as a validation cohort. Patients’ nutritional status were assessed using the PG-SGA. LASSO regression models and Cox regression analysis were used for factor selection and nomogram development. The nomogram was then evaluated for its effectiveness in discrimination, calibration, and clinical usefulness by the C-index, calibration curves, and decision curve analysis. Kaplan–Meier survival curves were used to compare the survival rate. Seven independent prognostic factors were identified and integrated into the nomogram. The C-index was 0.73 (95% CI, 0.72 to 0.74) and 0.77 (95% CI, 0.74 to 0.81) for the primary cohort and validation cohort, which were both higher than 0.59 (95% CI, 0.58 to 0.61) of the TNM staging system. DCA demonstrated that the nomogram was higher than the TNM staging system and the TNM staging system combined with PG-SGA. Significantly median overall survival differences were found by stratifying patients into different risk groups (score < 18.5 and ≥ 18.5) for each TNM category (all Ps < 0.001). Our study screened out seven independent prognostic factors and successfully generated an easy-to-use nomogram, and validated and shown a better predictive validity for the overall survival of cancer patients.


Introduction
Solid tumors remain the leading cause of cancer-related deaths worldwide. Improving the overall survival (OS) of patients is the most crucial target of anti-cancer therapy; thus, variables that predict the prognosis are of clinical and investigative interest [1]. The TNM staging system described in the 8th American Joint Committee on Cancer (AJCC) Cancer Staging Manual is the most widely used one [2]. The current TNM staging system is essential for predicting clinical outcomes and determining appropriate treatment. However, the survival of patients varies even among patients with the same disease stage, ranging from only a few weeks to several years [3].
Patients with cancer are known to have a higher risk of malnutrition than those without cancer. An understanding of cancer patient's nutritional status is essential to therapeutic decisions and their survival [4]. According to nutrition assessments from several countries, 25 to 85% of cancer patients are estimated to have cancer-related malnutrition [5][6][7]. More than half of all patients with solid tumors, especially gastrointestinal cancer, suffer from malnutrition, which is associated with decreased therapeutic response and increased mortality [8,9]. Progressive malnutrition is more common in patients with advanced cancer and leads to early mortality in the absence of nutrition support [8,10]. The Patient-Generated Subjective Global Assessment (PG-SGA) is a nutrition assessment tool based on the SGA and is recommended by the Academy of Nutrition and Dietetics (AND) for cancer patients [11,12]. A variety of other nutrition assessment tools, such as the Malnutrition Universal Screening Tool (MUST), are also used in hospitals, and identified that the impact of nutrition on cancer patients had been clearly outlined [13][14][15].
Nutritional status is closely associated with the survival and treatment of cancer patients, some prognostic models have included a nutrition-related aspect [4], and previously published study had used the PG-SGA in cancer patients to develop prognostic models [16]. The PG-SGA includes a patient-reported component (including weight, food intake, symptoms, and activities and function), and a professional assessment component (including weight change, nutrition-related disease, metabolic demand, and physical exam). However, a standardized protocol for PG-SGA could take too much time even if the interviewers were well trained. Some items on the PG-SGA may be perceived as hard to comprehend by the patients or as challenging to perform by healthcare professionals, especially the physical exam [17]. In this study, we sought to simplify PG-SGA, using the criteria described by the patient-reported component instead of the complete PG-SGA, and to integrate certain potentially useful clinical factors into the TNM staging system to improve the prognostic power. These purposes together are anticipated to increase the convenience and generalizability of this nutrition assessment.

Study population and design
The patients, with pathologically diagnosed solid tumor at any stage, came from the Investigation on Nutrition Status and its Clinical Outcome of Common Cancers (INSCOC) cohort in China; a detailed description of the design, methods, and development of the INSCOC study was provided elsewhere [7,18,19]. All cancer participants who met the inclusion criteria were recruited from multiple institutions in China between 2013 and 2019. The inclusion criteria in the present study were as follows: (1) patients aged 18 years or more; (2) a histological diagnosis of solid malignant tumors; and (3) a hospital stay longer than 48 h. The exclusion criteria were as follows: (1) patients with acquired immunodeficiency syndrome (AIDS) or transplanted organ(s); (2) patients who were admitted to the intensive care unit (ICU) and were in a critical condition at the beginning of recruitment; (3) patients who refused to participate or would not cooperate with the questionnaire survey (Supplementary Table 1). The primary outcome was 5-year mortality. The baseline survey included an inperson interview, a self-administered questionnaire by the patient, and a physical exam or anthropometric measurement by a trained interviewer using standardized protocols for the PG-SGA. Follow-up for participants included inperson or telephone interviews annually to collect their survival information. The outcome for this present analysis was censored in December, 2019.
Additionally, as shown in the study schematic ( Fig. 1), participants who had a missing critical clinical examination, or follow-up data, or more than 10% of all data, were excluded. Finally, 8749 patients were included in the current analysis as the primary cohort. With the same inclusion and exclusion criteria, the validation cohort was composed of 208 and 488 cancer patients who were followed at the First Affiliated Hospital of Sun Yat-Sen University and the First Affiliated Hospital of USTC between 2010 and 2019.
The study was conducted in line with the Helsinki declaration; its design was approved by the local Ethics Committees of all participant hospitals. All patients signed an informed consent form before participating in the study. The trial was registered at http:// www. chictr. org. cn with registration number ChiCTR1800020329.

Data collection
Four patient-reported domains: (1) weight, (2) food intake, (3) symptoms, and (4) activities and function were used in the present study. Unintentional weight loss was evaluated by comparing the historical weight (according to the patient's self-report) to the measured weight, obtained when the patient was wearing a light hospital gown upon admission. Food intake was evaluated by comparing the present intake to the intake in the past month. Symptoms were defined as problems that prevented patients from eating enough during the past 2 weeks. Food intake and symptoms were grouped since both of them could be used to reflect oral intake [20]. Activities and function were assessed by the presence of problems that decreased the functional abilities of the patient (Supplementary Table 2).
Clinical parameters and demographics information were also collected, including age, sex, body mass index (BMI), primary tumor site, TNM stage, co-morbidity, nutrition support interventions (enteral or parenteral), lifestyle habits (e.g., alcohol, smoking, or tea consumption), menstrual and reproductive history, medical history, occupational history, and family history. BMI was categorized using the classifications for Chinese population: underweight (< 18.5 kg/m 2 ), normal weight (18.5-23.9 kg/m 2 ), overweight (24-27.9 kg/m 2 ), and obese (≥ 28 kg/m 2 ). Fasting blood tests, such as total protein, albumin, pre-albumin, and creatinine, were collected with standard laboratory techniques within 48 h of admission. The scores for the PG-SGA were constructed using standard thresholds [21]. The OS of patients was calculated from admission to death or the last contact. All pathological staging was defined according to the 8th edition of the AJCC TNM staging system.

Construction, validation, and calibration of the nomogram
The least absolute shrinkage and selection operator (LASSO) method was used to select the most useful potential predictive features from the primary cohort. K-fold cross-validation was performed to select the best-fitted model according to the optimal lambda value. All potential risk factors were entered into subsequent multivariable Cox regression analysis to identify the independent predictors. The dual-direction step-wise method was applied to further reduce the number of covariates with Akaike's information criterion as the stopping rule.
Independent prognostic factors for survival were identified from multiple variables to generate a nomogram, following the Transport Reporting of a Multivariable Prediction Model for Individual Prognosis of Diagnosis (TRIPOD) statement. The nomogram was subjected to 1000 bootstrap re-samplings for internal validation of the primary cohort. Model performance for predicting the outcome and the discriminative ability was measured by calculating the concordance index (C-index). Calibration curves, generated by comparing the predicted survival with the observed survival after bias correction, were plotted to assess the nomogram's calibration, accompanied by the Hosmer-Lemeshow test.

Clinical application
To evaluate the potential clinical net benefit of the model, the researchers performed decision curve analysis (DCA) and compared the nomogram with existing models (TNM staging system and TNM staging system combined with scored PG-SGA) using the C-index.

Risk group stratification based on the nomogram
In addition to numerically comparing the discrimination ability by the C-index, we sought to illustrate the independent discriminatory ability of the nomogram beyond that provided by the standard TNM staging system. The patients were distributed into different risk groups within a specific TNM category according to the total risk scores (from highest to lowest) in the primary cohort, after which cut-off values of nomogram were determined.

Statistical analysis
Quantitative variables are expressed as the means ± standard deviations. Their differences were analyzed using Student's t test to see if variables followed a normal distribution, or nonparametric tests (Mann-Whitney or Kruskal-Wallis) if variables did not follow a normal distribution. Qualitative variables were analyzed using chi-square tests, with Fisher corrections if necessary. Kaplan-Meier curves were used to analyze the survival data. Cox regression analysis was used to evaluate the associations of OS with each factor. Adjusted hazard ratios (HRs) and their corresponding 95% confidence intervals (CIs) were derived from Cox models after adjusting for potential confounders. The optimal cut-off value was determined using the maximally selected rank statistics from the "maxstat" R package. Age was used as the timescale for all models, with entry time defined as the age at the baseline interview and exit time defined as the age at death, last follow-up, or December 2019, whichever came first. The significance level was set at P < 0.05 (two-sided probability). All analyses were performed using R version 3.6.2 (http:// www. rproj ect. org, rms package, survival package, and survminer packages). DCA was performed using the source file "stdca.r," downloaded from https:// www. mskcc. org.

Clinical features and characteristics of the study population
The demographic and clinical characteristics of participants are shown in Table 1  A coefficient profile plot was produced against the log (λ) sequence. Vertical line was drawn at the value selected using tenfold cross-validation, where optimal where optimal λ resulted in 24 non-zero coefficients resulted in seven non-zero coefficients cohort in that approximately half of the patients had GI cancers, and more than half of the patients had moderate or severe malnutrition. Most patients were in advanced stages. However, the validation cohort was different from the primary cohort in that the former had more elderly patients (≥ 65 years) (42.5 vs. 27.1%) and more patients with decreased oral intake (46.8 vs. 31.2%).

Feature selection and model building
The researchers observed that patients in the primary cohort with upper GI cancer had the highest HRs than other types of cancer. Seven potential predictors were selected from 19 features in the primary cohort with non-zero coefficients in the LASSO regression model. Cross-validation results showed that the optimal model had seven potential predictors, namely, tumor site, age, weight, food intake, activities and function, TNM stage, and creatinine level ( Fig. 2A and  B). After multivariable adjustments, Cox regression analysis identified these factors as independent predictors (Table 2). A model that incorporated the above independent predictors was developed and was presented as the nomogram (Fig. 3A).

Apparent performance and validation of the nomogram
The predicted and actual probabilities of cancer-related death are shown in the calibration plot, and the model demonstrated good agreement for predicting the risks of cancer-related death for both primary and validate cohorts ( Fig. 3B and C). The Hosmer-Leme show test of the primary cohort and validate cohort were not significant (P = 0.784, and 0.287), which suggested that there was no departure from a perfect fit. The C-index for the prediction nomogram was 0.73 (95% CI, 0.72 to 0.74) for the primary cohort and 0.77 (95% CI, 0.74 to 0.81) for the validation cohort.
Comparably, the C-index values of the TNM staging system and the TNM staging system combined with the scored PG-SGA for the prediction of cancer-related death were 0.59 (95% CI, 0.58 to 0.61) and 0.66 (95% CI, 0.65 to 0.67) respectively as shown in Table 2.

Clinical application of the nomogram
The DCA for the nomogram, the TNM staging system, and the TNM staging system combined with the scored PG-SGA is presented in Fig. 3D and showed that if the threshold probability of a patient or doctor was 20%, nomogram added more benefit than either the treat-all-patients scheme or the treat-none scheme. Within this range, the net benefit was comparable, and the nomogram was found to consistently perform better at all threshold probability values.

Performance of the nomogram for stratifying the risk of patients
The researchers determined optimal cut-off values (score < 18.5 and ≥ 18.5) by grouping patients in the primary cohort into two different risk subgroups after sorting by total score. Each group had a distinct prognosis. A significant distinction was observed in the Kaplan-Meier curves for survival outcomes within each TNM category (all P < 0.001) (Fig. 4).

Discussion
The 8th edition of the AJCC TNM staging system is the most widely used staging system [22]. Due to the high heterogeneity of solid tumors, it is difficult to predict a patient's survival accurately, even using the TNM staging system [3]. There is substantial evidence that nutritional status influences the cancer patient's survival. This study hypothesized that adding nutritional components to TNM staging system could improve survival prediction clinically.
To test the hypothesis, the researchers found that tumor site, age, tumor stage, weight, food intake, activities and function, and creatinine level were independent prognostic factors through multivariable analyses, and successfully developed a nomogram for predicting the overall survival for cancer patients. The researchers subsequently validated this nomogram both internally and externally and found that this nomogram was superior to the TNM staging system in predicting overall survival. The current nomogram integrated a nutrition assessment into the traditional TNM staging system and might help clinicians improve survival prediction if further validated.
This predictive nomogram for OS included three significant domains (age as a host-related factor, tumor site, and TNM stage as disease factors, and weight, food intake, activities and function, and creatinine level as nutrition-related factors [23]. Therefore, when estimating the clinical outcome in cancer patients, various host-and nutrition-related factors, in addition to the current TNM staging system, should be considered.
Nutritional status is an important prognostic factor in all cancer patients. Patients with cancer are known to have a higher risk of malnutrition than those without cancer [4]. In this study, the researchers noted that the patient-reported nutritional component was predictive of OS in the multivariate model. For the construction of the model, seven potential predictors were selected from 19 candidate features by the LASSO method. The model combining multiple features demonstrated adequate discrimination in the primary cohort, then further confirmed in the validation cohort.
Unintentional weight loss is an important criterion when assessing nutritional status in cancer patients. It is often the first visible sign of the disease among patients with cancer, with 40% of the patients reported that they had lost > 10% of their usual body weight when first diagnosed [24]. Different symptoms that constitute barriers to dietary intake can be assessed and quantified as symptoms that impact nutrition [20]. Additionally, it is noteworthy that patients with solid cancers were more likely to suffer from worse functional  [20], and the activities and function were also found to be an independent predictor of survival. This study also showed that the creatinine level was associated with a poor outcome and was an independent risk factor for OS.
Advanced TNM stages and older age are both wellknown predictors of a worse cancer survival [25]. Notably, the tumor site is also a significant factor, and patients with upper GI cancer tend to have worse survival. One underlying reason could be that patients with upper GI cancers (esophageal, gastric, pancreatic) are more prone to malnutrition due to alternation in GI function, which adversely affects the cancer treatments and is associated with a worse clinical outcome [26,27]. Of note, conventional treatments, particularly for patients with primary tumors in the upper GI tract, are more likely to deteriorate a patient's nutritional status and lead to the development of cancer-associated cachexia [28][29][30].
Because the performance in terms of risk prediction, discrimination, and calibration could not capture the clinical consequences [31], decision curve analysis was applied in the current study. This novel method offers insight into clinical consequences based on a threshold probability, from which the net benefit could be derived (20%). Using the established nomogram to predict cancer-related death adds more benefits than the treat-all-patients scheme or the treat-none scheme.
The limitation of this study is that weight change is not an objective indicator of disease status in the presence of ascites, edema, or the growth of the tumor itself (including its metastases). Therefore, an evaluation of body weight instead of body composition can be misleading [32][33][34]. Additionally, weight loss and decreased food intake were often based on patients' self-report, which may be subject to misestimation. Nonetheless, the final models performed well in terms of calibration and discrimination. The model shall be applied with caution for these potential drawbacks, which may be addressed in revisions.
In conclusion, to the best of the researchers' knowledge, this is the first clinical scoring system developed based on patient-reported components from the commonly used PG-SGA and clinical parameters. The researchers established and validated a nomogram for predicting the overall survival of patients with solid tumors. Notably, these characteristics are commonly assessed in daily clinical practice in hospitalized patients, which is a practical advantage. This nomogram may play a role in future personalized cancer management and clinical trial design if further validated.