3.1. Age groups distribution of Erbil population in 2017
The distribution percentage of Erbil population of 2113391 in 2017 by age group of tens and gender was calculated and shown in Table 1. The figures in this table are used for the calculation of measures of association (correlations and regressions). The age of about 25% of the population of females and males falls within the age group up to 20 years and that less than 7% falls over 50 years. This relation partly explains why the relative higher numbers of Lung Cancer cases occurred in the over 60 age groups.
Table 1: Erbil population percent and numbers in 2017 distributed by 10 years age groups and gender
Age groups 10 Y
|
Female
|
Male
|
%
|
Number
|
%
|
Number
|
10
|
11.8
|
127587
|
12.3
|
132993
|
20
|
10.7
|
115694
|
11.6
|
125425
|
30
|
8.6
|
92097
|
8.9
|
96231
|
40
|
7.1
|
76769
|
6.8
|
73526
|
50
|
5
|
54062
|
4.8
|
27572
|
60
|
3.3
|
35682
|
2.5
|
27031
|
70+
|
3.5
|
37844
|
3.3
|
38925
|
Total
|
50%
|
1056695
|
50%
|
1056695
|
3.2. Age distribution and descriptive statistics of the 590 Lung cancer cases
Age distribution of the 590 Lung Cancer cases, from 2013 to 2017 and the Percentiles is given in Figure 1. Mean age at diagnosis is 51.17± 0.823 with a Standard Deviation (SD) which describes variability, of 19.98 year and the Q1 (first Quartile) is 40.00 year. The Median is 54 years with minimum age of 1.0 year and maximum of 99.0 years. Age at diagnosis in years is tested for normality using Anderson-Darling test of normality and shown in Figure 1. Anderson Darling test value is 4.454 with a P ≤ 0.005. This indicates age in years of the 590 Lung cancer cases is not normally distributed. This is the reason we adopted the Chi-Square non-parametric test of significance in the analysis of results.
3.3. Confidence interval and limits for age of the 590 Lung cancer cases
Mean age at diagnosis equals 51.17 years with a SD of 19.98. To reflect on the spread of the data of age at diagnosis, the 95% Confidence Interval and Limits of Lung Cancer mean are calculated and found that the Upper Limit = 90.33 and the Lower Limit = 12.00 Year. This Confidence Interval and Limits includes 95% of the cases, the 99% Confidence Interval is wider and effectively covers all age groups. Falling within a certain Confidence Interval and Limits help in reasoning when taking the history from patients. The overall Crude Incidence Rate Standardized per 100000 of the population in 2017 is 27.28 per 100000, it was 24.03 in males and 31.79 in females. Gender Percentage Ratio in the 590 Lung Cancer cases: Males: Females = 43.05 %: 56.95% (254: 336 cases). Percentile 50 is about 50 years which nearly equals the mean, Figure 4.
3.4. Measures of Association between occurrence of Lung cancer and age groups:
For the purpose of calculating Pearson’s Correlation Coefficient (r) and the Regression Coefficient (b) as measures of association between age and incidence of Lung Cancer cases, in males and females, data are sorted according to age groups, Table 2.
Table 2: Age groups in 10’s and the number of Lung cancer cases in females and males.
Age group
|
Male
|
Female
|
Total
|
10 years
|
13
|
14
|
27
|
20 years
|
12
|
13
|
25
|
30 years
|
16
|
20
|
36
|
40 years
|
22
|
48
|
70
|
50 years
|
28
|
74
|
102
|
60 years
|
46
|
69
|
115
|
70 years
|
59
|
66
|
125
|
80+ years
|
58
|
32
|
90
|
Total
|
254
|
336
|
590
|
3.5. Correlation Coefficients (r)
Correlation Coefficients are qualitative measures of association and runs from (+1 to -1) according to the nature and direction of association. Data in Table (2) are used to calculate Pearson’s Correlation Coefficient (r) for total cases of Lung cancer (Males and Females) and found to equal (0.875 ± 0.033 with P= 0.044) and R-square (Precision) of 0.766 (Any below 80% should be taken with caution in scientific work). Female’s Correlation (0.643 ±0.052 with P=0.085) pulled down by the number of Lung cases in the age group 80+, yet, it is intermediate. Pearson’s Correlation Coefficient (r) between Age in years and occurrence of Cancer in males is very high (r = 0.953 ± 0.129) and significant at P = 0.0002. It should be mentioned that the association with the increase in age group is very clear up to age of 80 years and over, where the number of recorded cases fell to only 90 cases in total. This might be an error of reporting particularly on the female side where the over 80 female lung cancer cases dropped to only 32 cases compared to 66 cases in age group 70. Correlations Coefficient, in both males and females are positive and confirm the fact that cancer incidence grows with age and covers all ages from birth to death.
3.6. Regression coefficient of number of lung cancer cases (Y) on age groups (X)
To quantify the relation between Lung cancer cases and age groups, Table 2, a Regression analysis and calculation of the Regression Coefficients (b) of Y (number of cases) and X (age groups) for both males and females separately and for the total cases was conducted. The Prediction Equation was also calculated which enables planners to estimate the expected incidence of Lung cancer at any age. This, of course, may be added to the jigsaw of diagnosis process.
3.7. Regression of female lung cancer cases on age
Regression Analysis of Female cases on Age is conducted and is shown below and in Figure 5. The Prediction Regression equation that Female Lung Cancer (F) = 11.79 + 0.6714 Age group. This means any age group can be selected to predict for expected incidence.
Table 3: Analysis of Variance of female lung cancer
Source
|
DF
|
SS
|
MS
|
F
|
P
|
Model
|
1
|
1893.4
|
1893.4
|
4.24
|
0.085
|
Residual
|
6
|
2680.6
|
446.76
|
|
|
Total
|
7
|
4574
|
|
|
|
Female Regression Coefficient of number of Lung cancer cases and age in years equals + 0.6714 case/10 years which is marginally significant at P= 0.085. The reason is as already mentioned when calculating the Correlation coefficient for females. This means that Lung cancer in females is on rise with years, Table 3.
3.8. Regression Analysis of Males Lung cancer cases on age
Association as Regression coefficient and the Prediction Equation for males Lung cancer on age is calculated and shown in the ANOVA table (Table 4) and Figure 6. The prediction equation is that Male Lung Cancer cases = - 2.857 + 0.7690 Age group. The Regression coefficient is + 0.769 per 10 years of age and it is highly significant (P=0001).
Table 4: Analysis of Variance for male cancer patients.
Source
|
DF
|
SS
|
MS
|
F
|
P
|
Model
|
1
|
2484.0
|
2484.0
|
4.24
|
0.000
|
Residual
|
6
|
249.5
|
41.6
|
|
|
Total
|
7
|
4574
|
|
|
|
This means that Lung Cancer cases linearly increased with age. The Prediction equation, which assumes linearity, can be used to predict Lung Cancer occurrence at any age. This regression analysis and results are trustworthy with an R-Square of about 90%.
3.9. The Regression of Total Lung Cancer Cases (Males and Females) on Age Groups:
Regression analysis for data in Table (2) and Figure (4) showed that the Regression equation is that Total lung cancer cases = 8.93 + 1.440 Age group. Regression Coefficient (b=1.44) is highly significant (P = 0.004) and estimated with precision (R-square = 76.50%). This means, with this linear relationship between the occurrence of Lung Cancer and age we can predict the level of incidence will increase by 1.44 every ten years. Figure 7 shows Linear regression line of total Lung cancer cases on age groups and the prediction equation.
3.10. Exploratory analysis
Basic exploratory of the data was performed to evaluate the covariate distribution in uncensored/ censored patient and in each ethnic group. It can be noticed that 42.4% of censored patients were male while 57.6% of censored were female patients. Table 5 indicates that more female patients were died than male lung cancer patient.
Table 5: Distribution of covariates in the study population by gender
Variables
|
Male
|
Female
|
P-value*
|
n (%)
|
n (%)
|
Surgery
|
Made surgery
|
172 (67.7)
|
52 (76.47)
|
0.07
|
Does not make surgery
|
82 (32.3)
|
14 (20.59)
|
Radio
|
Took Radiotherapy
|
73 (28.7)
|
135 (40.2)
|
0.004
|
Does not take Radiotherapy
|
1 81(71.3)
|
201 (59.8)
|
Chemotherapy
|
Injected Chemotherapy
|
220 (86.6)
|
304 (90.5
|
0.141
|
Does not inject Chemotherapy
|
34 (13.4)
|
32 (9.5)
|
Hormone
|
Used hormone
|
38 (15.0)
|
91 (27.1)
|
<0.001
|
Does not use hormone
|
216 (85.0)
|
216 (72.9)
|
Immune system
|
Took immune system
|
23 (9.1)
|
26 (7.4)
|
0.566
|
Does not take immune system
|
231 (90.9)
|
310 (92.3)
|
Statue
|
Dead
|
58 (22.8)
|
70 (20.8)
|
0.559
|
Alive
|
196 (77.2)
|
266 (79.2)
|
Age at diagnose
|
Less than 20 years
|
25 (9.8)
|
27 (8.0)
|
<0.001
|
20 to 39
|
38 (15.0)
|
68 (20.2)
|
40-59
|
74 (29.1)
|
143 (42.6)
|
More than 60 years
|
117 (46.1)
|
98 (29.2)
|
*Chi-square tests were performed for categorical variables
|
It our interest to investigate the relationship between variables of interest and gender. Chi-square test revealed that each variable of radiotherapy, taking hormone and age at diagnose has a statistically significant relationship with gender since their p-values are less than 0.05. The above table illustrates that there were association between gender and radiotherapy (p= 0.004), mode of gender and Hormone (p < 0.001) and gender and Age at diagnose (p < 0.001). The other remained categorical covariates were not statistically significant.
Figure 8 shows that female patients survive longer than male. However, the confidence intervals illustrate that the uncertainty is greater in survival curve for Female patients. It also indicates that the median survival probability is greater than male lung cancer patients.
Table 6: Multivariate Cox regression modelling for each covariate.
Variables
|
Hazard Ratio
|
P-value
|
95% CI
|
Gender
|
0.81
|
0.247
|
0.56, 1.16
|
Surgery
|
1.92
|
<0.001
|
1.34, 2.73
|
Radio
|
1.15
|
0.584
|
0.80, 1.65
|
Chemotherapy
|
1.52
|
0.114
|
0.90, 2.59
|
Hormone
|
0.81
|
0.436
|
0.50, 1.39
|
Immune system
|
0.56
|
0.039
|
0.31, 0.97
|
Age at diagnose
|
|
|
|
20 to 39
|
1.57
|
0.298
|
0.66, 3.71
|
40-59
|
1.68
|
0.205
|
0.75, 3.75
|
More than 60 years
|
1.44
|
0.379
|
0.64, 3.21
|
The objective of this analysis was to determine which potential covariates have effect on survival probability. We have used multivariate cox regression to calculate mortality rates. Table 6 shows the result of multivariable cox regression model which indicates gender had no influence on survival outcome (HR ~= 0.81, 95%CI: 0.56 to 1.16, p=0.0.247). However, taking surgery and taking immune system are statistically significant prognostic factors for lung cancer patients. The model indicated that the risk of mortality increases by 92% if lung cancer patients do not take surgery (HR ~= 1.92, 95%CI: 0.31 to 0.97, p=0.039). Furthermore, the risk of mortality is reduced by 44% among those patients who took immune system. The proportional hazard assumptions were checked to investigate whether the hazard ratio is approximately proportional using both Log-log plot and a plot of log hazard ratio over time (i.e., the ratio of PH is approximately constant over time or whether it is time- dependent).
Log-log plot of left-hand side in figure 9 for surgery shows that the hazards are approximately proportional since the difference between two lines are approximately constant. The plot of log hazard ratio with time shows the approximate straight lines illustrating that hazard ratio remains constant with time.
As for immune system, the hazards are not proportional since the difference between two lines are not constant. Also, the plot of log hazard is not constant over time. Thus, the proportionality hazard assumption for immune system is not met.
3.11. The outcome of the most common treatment regime: Surgery, Chemotherapy and Radiotherapy alone and in combination
Data are sorted for repetitions of main three treatment regime alone and in combination and put in Table (4). This is to show, what was the treatment regime (s) preferred to treat Lung cancer in the 590 cases of Lung cancer in Erbil.
Table 7. The repeated usage of the 3 treatment regimens and combinations for the 590 Lung cancer cases from 2013 to 2017.
Treatment
|
Male
|
Female
|
Total
|
Percent
|
Dead
|
Alive
|
Dead
|
Alive
|
Surgery plus
|
22
|
156
|
39
|
218
|
435
|
73.7
|
Chemo plus
|
36
|
189
|
51
|
256
|
532
|
90.16
|
Radio plus
|
12
|
60
|
18
|
116
|
206
|
34.9
|
Total treatment
|
70
|
405
|
108
|
590
|
1173
|
|
Percentage
|
5.97
|
34.52
|
9.2
|
50.3
|
|
|
Table 7 shows that the total treatments offered to the 590 patients is 1173 with an average of 2 treatments per patient. The table also shows that the treatment of choice is Chemotherapy, in combination with other regimes. Its usage constituted 90.16% of all treatments conducted on the 590 cases followed by surgery alone and in combination with 73.70% and the least used is Radiotherapy alone and in combination with 34.90%.
The results also indicated that Surgery and combination succeeded to keep alive 35.80% males and 49.43% females, Chemotherapy and combination helped to keep alive 35.53% males and 48.12% females while Radiotherapy and combination protected 29.13% males and 56.31% females. Total percentage of (success) in keeping life is about the same (85%).
3.12. Relative Risk (RR) Ratio of Lung Cancer for Females versus Males for 10 years age groups (10 to 70 years and over):
Relative Risk Ratio (RRR) measures the strength of association or relation. In this analysis. The concept of calculation of RR slightly modified to express the Probability of occurance of Lung cancer in females to that in males. The RR ratio shown in Table (5) expresses which gender is at a higher risk of having lung cancer and at what age group?.
Table 8. Relative Risk (RR) ratio of female Lung cancer incidence versus male Lung Cancer
Age Groups
|
Female
|
Male
|
Relative Risks F / M
|
10 years
|
14
|
13
|
0.81
|
20
|
13
|
12
|
0.82
|
30
|
20
|
16
|
0.95
|
40
|
48
|
22
|
1.64
|
50
|
74
|
28
|
1.99
|
60
|
69
|
46
|
1.3
|
70+
|
98
|
117
|
1.14
|
RR Total
|
336
|
254
|
1.32
|
It is clear that Males are at higher risk than Females to have Lung cancer up to the age of 40 then the trend changed direction where Females show higher risk than Males from the age 40 onwards, it doubled at the age group of 50-60.