Data
We analyzed data from patients with histologic evidence of Non-Small Cell Lung Cancer (NSCLC) stage IIIb-IV recruited for a controlled phase III trial (http://www.who.int/ictrp/network/rpcec/en/; Cuban Public Registry of Clinical Trials; Trial number RPCEC00000161). We selected all patients with measures of pre-treatment basal EGF concentration, peripheral blood parameters, inflammation, and immunosenescence biomarkers. The methods and results of these trials have been reported elsewhere [1, 7]. Briefly, patients were randomized to either vaccine Arm (CIMAvaxEGF plus Best Supportive Care) or Control Arm (only Best Supportive Care). The eligible patients were those aged 18 years or older with histologically or cytological confirmed stage IIIb or IV NSCLC, and with an Eastern Cooperative Oncology Group (ECOG) performance status of 0 to 2. All patients had received 4 to 6 cycles of platinum-based chemotherapy before the random assignment and had finished first-line chemotherapy at least 4 weeks before entering the trial. Exclusion criteria included patients who had received other investigational drugs; patients with known hypersensitivity to any component of the formulation; patients who were pregnant or lactating; patients with uncontrolled chronic diseases, history of severe allergic reactions; patients with brain metastases or other primary neoplastic lesion; patients with active infections, symptomatic congestive heart failure, unstable angina, cardiac arrhythmia or psychiatric disorders; and patients receiving systemic corticosteroids at the time of inclusion and patients with positive serology for hepatitis B and C or HIV. The primary efficacy endpoint was the survival time, defined as elapsed time since trial inclusion to death.
The potential pre-treatment predictive variables considered were basal serum EGF concentration, peripheral blood populations: absolute neutrophils, lymphocyte, monocytes and platelets counts, neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) and immunosenescence biomarkers (The proportion of CD4 + T cells and CD4/CD8 ratio). We only included in this work data from 40 patients who had completed measures of the potential pre-treatment predictive variables.
The two arms were well matched for baseline demographic and tumour variables, such as sex, ethnic origin, age, smoking status, ECOG, disease stage, histology, and response to initial chemotherapy (Table 1). Most patients did not receive further chemotherapy at progression (in consonance with the national treatment guideline), as the recommended second-line drugs pemetrexed, docetaxel, and erlotinib were not widely available in the country at the time of trial execution. In the vaccine arm, 2 patients (7.1%) received additional chemotherapy, etoposide and vinblastine respectively, no external radiotherapy was administered at any time to any of the patients included in the study.
Modeling approach
Calculation of the predictive causal inference association for all possible models
Following the causal inference approach proposed by Alonso and colleagues [8], we analyzed each of our potential predictors separately, first in a univariate way, and later all possible combinations of them. For all, the predictive causal information (PCI) was calculated. It was defined as the correlation between the treatment effect and the predictors. PCI indicates the prediction accuracy, i.e., how accurately one can predict the individual causal treatment effect on the true endpoint for a given individual, using his pre-treatment predictor measurements. The interpretation is similar to the widely used correlation coefficients. If PCI is exactly 1, that indicates a perfect prediction of the individual causal treatment effect using the values of predictors. The closer the values are to zero, the lower the model's ability to predict the real benefit of the patient from the values of the predictors. The prediction accuracy was classified according to the value of the PCI as negligible (PCI≤0.3), with low accuracy (0.3< PCI≤0.5), moderately accurate (0.5< PCI≤0.7), highly accurate (0.7< PCI ≤0.9) and very highly accurate (0.9< PCI ≤1). All calculations were performed using the R library EffectTreat.
Selection of a model taking into account its complexity and prediction accuracy
The inclusion of more predictors will always lead to an increase in information about the effect of individual causal treatment. However, measuring and collecting data on multiple predictors can increase the burden for clinical investigators, patients and generate higher costs. We propose to follow the criterion of parsimony, that is, to select a model with the correct amount of predictors necessary to explain the data well. Firstly, within the combinations with the same number of predictors, we select the one with the highest PCI value. Then, we classify its accuracy according to the scale previously described. Finally, we chose the model with a minimum number of predictors (lowest complexity), but with all the PCI values above 0.7, that is, with high prediction accuracy.
Identification of good, rare and bad responders to the treatment
The classical definition of the responder (tumour reduction or complete remission) is modified in this investigation to adapt to the more general clinical situation. We define good responders as patients under the new treatment, who benefit from it. Their benefit manifests itself in the fact that their value of the survival time is longer than that of patients with the same characteristics (predictive factors), randomized in the control group. The causal inference approach implies a comparison between what actually happened with the new treatment and what would have happened if the patient had received the control treatment. Each patient has one outcome that would manifest if the patient were exposed to the new treatment and another outcome that would manifest if s/he were exposed to the control. The "individual causal treatment effect" is the difference between these two possible outcomes. The key challenge is that it is not possible to observe both outcomes simultaneously in the same patient. Therefore, the correlation between the potential outcomes cannot be estimated from the data. In the methodology proposed by Alonso [8], a sensitivity analysis is introduced to handle this problem. These authors assume a range of possible values for the correlation between the potential outcomes and for each correlation they estimate the probability of treatment success for an individual patient. An individual is classified as a good responder if all their estimated probabilities of treatment success are greater than 0.5. We define bad responders to be patients under the new treatment who are harmed by it, that is if all the estimated probabilities of treatment success are lower than 0.5. Consequently, rare-responders would be patients who are neither good nor bad responders. In this last group are the patients that, depending on the assumed value for correlation between the potential outcomes, can have values of probability of treatment success above and below 0.5.
.
Subgroup analyses for survival benefit
To show the heterogeneity in the response to CIMAvax-EGF, the Kaplan Meier survival curve was estimated in the good and poor responder groups. The log-rank test was used to compare the survival for the treated and control groups inside the subgroups identified by the biomarkers.