Optimal risk classification with statistical evidence in endometrial cancer

doi:10.21203/rs.3.rs-29247/v1

Download PDF

Research article

Optimal risk classification with statistical evidence in endometrial cancer

https://doi.org/10.21203/rs.3.rs-29247/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

It is often clinically useful to classify tumor markers into risk groups. This study was aimed to investigate whether beginning with a statistically sound method would find cut points more reasonable than conventional ones.

Methods

We used data of endometrial cancer including 442 patients. The optimal number of cutoffs was based on the Akaike criterion and statistical algorithms were adapted to find the best locations. Codes were provided as a package.

Results

Myometrium invasion was an independent risk factor for lymph nodal metastasis when stratified into three groups by 0.41 and 0.89. Tumor size was an independent risk factor for overall survival when stratified into two groups by 4.11 cm. Both had better prediction than conventional choices and clinical relevance.

Conclusion

A statistically sound algorithm should be used to stratify patients into risk groups.

Health Economics & Outcomes Research

risk classification

Akaike

survival

logistic regression

endometrial cancer

In clinical practices, it is often helpful to classify continuous biomarker values into differential risk groups to facilitate evaluation of a biological, physiological, or pathological state, and methods of classification of biomarkers have been increasingly applied in disease diagnosis, monitoring of disease progression, assessment of prognosis, and development of pharmaceutical agents. Although multiple statistical methods are available to determine a cut point, they have not been well recognized in biomedical studies, which may have led to choices of classification that do not correspond to the most significant split of risk levels. Here we employed data of endometrial cancer to investigate whether a statistically sound method would find cut points that were more reasonable than conventional ones.

Endometrial cancer, i.e., malignant tumors in the corpus uteri, is the third most common female cancer worldwide [1]. A study has shown that the incidence rates increased over time, in consecutive generations and in several countries [2]. The estimated cumulative risk of endometrial cancer is 0.96%, and the 5-year overall survival depends on the stage and ranges from 74–96% [3, 4]. The 2009 International Federation of Gynecology and Obstetrics (FIGO) is commonly used for staging, and it includes assessment of myometrial invasion and lymph node metastasis [5]. Aside from the factors in the FIGO classification, a variety of factors have also been identified to be associated with prognosis such as histological grade and type, tumor size, and lymphovascular space involvement [6].

In clinical practices, it is often desired to reduce continuous variables, such as the invasion depth to myometrium and tumor size, into categorical ones for a clear guideline of the use of prognostic factors. Classification of continuous factors into risk groups has entailed numerous efforts. For example, the differentiation between stage IA and IB is well established to be whether the fraction of myometrial invasion exceeds 50% [5], and a tumor larger than 2 centimeters (cm) has been implicated to be associated with lymph node metastasis [7, 8].

However, in most previous studies, patients were first grouped based on a certain cutoff value, say, 2 cm of tumor diameters, and the risk of the two groups were compared and tested. If a significant difference between the two groups was identified, the pre-determined cutoff value was recognized as the classification criterion. Such an approach might have missed potential cut points corresponding to the most significant differentiation. Therefore, the aim of this study was to identify, based on statistical evidence, the cut points (including numbers and locations) of tumor size and myometrial invasion fraction that could best tell apart the risk of patients with endometrial cancer in relation to lymph node metastasis and overall survival. The predictive power, in comparison to the conventional cuts, and the clinical relevance of the cut points identified were also to be explored. We also aimed to provide codes for researchers to use in their future studies.

A retrospective chart review was performed in the Kaohsiung Veterans’ General Hospital (KSVGH), a public teaching hospital in southern Taiwan. The acquisition of patient data was approved by the Institutional Review Board of Kaohsiung Veterans’ General Hospital and was in accordance with the Declaration of Helsinki. Patients’ written informed content was waived by the IRB. A total of 447 patients who had endometrial cancer were identified from 1997 to 2010. The patients’ information, including demographic and clinical data, was registered. Five patients were excluded in the initial stage of the study due to scanty information available.

All the statistical analysis was carried out with the statistical computing and graphic drawing language, R [9]. Univariate and multivariate analyses were performed based on the Cox proportional hazards model in the case of survival data (overall survival as the response) and logistic regression model in the case of binary outcome (lymph node metastasis as the response). Choice of the optimal number of cut points was based on the Akaike information criterion (AIC) in that the smallest AIC value gave the best choice. In the case of survival analysis, the algorithms of calculating AIC values to determine the optimal number of cut points and identifying their locations were given by Chang et al [10], which is an improved version with very high accuracy of finding the correct number of cut points over a previous R-coded program [11]. The same algorithms were applied here for the binomial response, i.e., lymph node metastasis, with modification tailored to binomiality, and the codes are provided in Supplementary File 1 with an example as Supplementary File 2. Predictive power in the survival analysis was evaluated by concordance statistics (C-index), and in the binomial logistic analysis, sensitivity, specificity and area under curve (AUC). Calculations of AIC, c-index, specificity, sensitivity, area under curve (AUC) and accuracy were all based on 5-fold validation: the data were randomly divided into 5 groups with four as the training set and one as the validation set, and the procedure was repeated for 50 times.

Patient information

Four hundred and forty-two patients with endometrial cancer were included in the analysis. Their baseline demographic and clinical characteristics are shown in Table 1. The median age of the patients was 54 years; most of them were at an early stage (65.2%) and had a type I, or, endometrioid tumor (78.5%); less than 10% of the patients had lymph node metastasis; 50 patients saw their disease relapsed, and 52 died.

Table 1

Baseline Characteristics of the Patients
Characteristics (n = 442)	Median (range)
Age, years	54 (22–88)
Body mass index	25.3 (15.1–62.5)
Tumor size, cm	3.9 (0.5–24.5)
Overall survival, days	1,228 (6–7,309)
FIGO stage	number (%)
IA	227 (51.4)
IB	61 (13.8)
II	27 ( 6.1)
III	51 (11.5)
IV	27 ( 6.1)
unknown	49 (11.1)
Histologic grade
well differentiated	222 (50.2)
moderately differentiated	102 (23.1)
poorly/undifferentiated	79 (17.9)
unknown	39 ( 8.8)
Histologic type
type I (endometrioid)	334 (78.5)
type II (serous, clear cell)	13 ( 3.0)
mixed	16 ( 3.6)
others¹	28 ( 6.3)
unknown	38 ( 8.6)
Lymph node status
non-metastatic	348 (78.7)
metastatic	34 ( 7.7)
unknown	66 (13.6)
Myometrium invasion
no invasion	143 (32.4)
invasion depth < 50%	94 (21.4)
invasion depth ≥ 50%	97 (22.0)
unknown	108 (24.2)
Recurrence	50 (11.3)
Death	52 (11.8)
1, mucinous, leiomyosarcoma, atypical hyperplasia, endometrial stromal sarcoma and carcinosarcoma.

Optimal cutoff choices for myometrium invasion fraction and tumor size

Based on the criteria that the most significant split should be the optimal cut point(s), the fraction of myometrium invasion depth was assessed by calculating the AIC values when one, two, or three cuts were made in response to death and lymph node metastasis and their respective locations of cutoffs in each scenario. As shown in Table 2, when death (overall survival time) was used as the outcome, the optimal cut point was 0.9 as the corresponding AIC value was the smallest, and when lymph node metastasis was the outcome, it was best to have two cut points at 0.41 and 0.89 as the corresponding AIC value was the smallest. These choices of cut points can also be visualized in Fig. 1.

Table 2

Optimal number and locations of cutoff points of myometrium invasion fraction
cuts	Overall survival			Lymph node metastasis
cuts	AIC	Locations	Hazards ratios¹	AIC	Locations	Odds ratios¹
1	172.93*	0.90	1.51	98.81	0.89	9.39
2	174.85	0.28, 0.91	1.38, 1.39	97.11*	0.41, 0.89	2.68, 18.37
3	176.51	0.28, 0.53, 0.56	1.49, 1.42, 1.49	97.54	0.20, 0.42, 0.90	1.66, 4.25, 24.52
1, In case of two and three cutoff points, the first hazards/odds ratio is the ratio of group two to one, the second, group three to one, and the third, group four to one.
* The smallest AIC value shows the best choice of the number of cutoff points.

Similar analyses were performed on tumor size in order to find the optimal cut points of tumor size related to risk of death and risk of lymph node metastasis. It was best to make one cut at 4.11 cm for tumor size in response to overall survival, and one cut at 4.90 cm in response to lymph node metastasis (Table 3). Visualization of the choices is shown in Fig. 2.

Table 3

Optimal number and locations of cutoff points of tumor size
cuts	Overall survival			Lymph node metastasis
cuts	AIC	Locations, cm	Hazards ratios¹	AIC	Locations, cm	Odds ratios¹
1	135.32*	4.11	2.45	102.60*	4.90	2.94
2	135.61	4.07, 6.53	3.11, 1.28	103.06	2.00, 4.91	1.99, 3.84
3	136.22	2.80, 4.17, 6.44	1.81, 3.60, 1.35	103.98	2.23, 4.98, 5.37	1.36, 2.33, 4.89
1, In case of two and three cutoff points, the first hazards/odds ratio is the ratio of group two to one, the second, group three to one, and the third, group four to one.
* The smallest AIC value shows the best choice of the number of cutoff points.

Independent risk factors for overall survival and lymph node metastasis

We then ran a full-panel univariate analysis on all the available variables, including age (continuous), BMI (continuous), comorbidities of heart disease and diabetes, FIGO stage (IA and IB as early vs. II and above as late), histological grade, histological subtype, LVSI, margin involvement, peritoneum involvement, and tumor markers of CA125 (35 U/ml as the threshold) and CEA (5 ng/ml) in response to both overall survival and lymph node metastasis, together with myometrium invasion fraction and tumor size treated as binomial variables with their respective cutoff choices above for each event. As shown in Table 4, the significant ones (p ≤ 0.05) were then tested with a multivariate analysis, which showed that the independently significant factors were FIGO stage (p = 0.05), histological subtype (type II vs. type I: p < 0.01, mixed vs. type I: p < 0.01, others vs. type I: p = 0.34), and tumor size (p = 0.03) in response to overall survival and FIGO stage (p < 0.01) and myometrium invasion (p < 0.01) in response to lymph node metastasis. In other words, myometrium invasion was an independent risk factor for lymph node metastasis and tumor size, overall survival.

Table 4

Independent risk factors relative to overall survival and lymph nodal metastasis
	Variable	p value
Overall survival	Tumor size (cutoff at 4.11 cm)	0.03
	FIGO stage	0.05
	Histological type
	type I	reference
	type II	< 0.01
	mixed	< 0.01
	others	0.34
Lymph nodal metastasis	Fraction of myometrium invasion depth
	<0.41	reference
	[0.41, 0.89]	0.02
	>0.89	< 0.01
	FIGO stage	< 0.01

Comparison of predictivity with traditional cutoff values

We then analyzed how the cut points identified in this study compared to the tradition ones, 0.5 for myometrium invasion fraction and 2 cm for tumor size. Since myometrium invasion was an independent risk factor for lymph metastasis, but not for overall survival, we only compared the cut points of this variable in relation to lymph node metastasis. Similarly, comparison was performed for tumor size in relation to overall survival.

We performed test statistics to compare which choice of cut points, 0.5 vs. 0.41 and 0.89, had better predictive power for lymph node metastasis in the setting of multivariate analysis with myometrium invasion fraction and FIGO stage included. We compared AUC, sensitivity, specificity and their sum, and accuracy based on the Youden criterion with 5-fold validation. As shown in Table 5, when the cutoff was made at 0.41 and 0.89, all the above values were better than those when the cutoff was made at 0.5.

Table 5

Performance of myometrium invasion depth in prediction of lymph node metastasis¹
Choice of cutoffs	AUC²	sensitivity	specificity	sensitivity + specificity	accuracy
At 0.5	0.8771	0.6836	0.8105	1.4941	0.8005
At 0.41 and 0.89	0.9198	0.8068	0.8249	1.6318	0.8288
1, FIGO stage was also included as the confounding factor.
2, AUC, area under curve.

Comparison between 2 cm and 4.11 cm for tumor size was based on the c-index, which gave an index to show the predictive power of the tested variables. The test was also carried out with a multivariate setting in that both FIGO stage and histological type were included as covariates and was validated with five-fold analysis. The c-index for 4.11 cm as the cut point was 0.663, which was better than that of 2 cm, 0.623.

Various biomarkers have been studied intensively for their potential to help with diagnosing and monitoring progression of diseases, assessing patients’ prognosis and developing new drugs. In clinical practices, it has been widely applied to classify the biomarker implicated in a disease into two or more groups and the classification may serve as a clinical guideline to help make treatment decisions. For example, CA125 is an important biomarker in ovarian cancer with the cut point at 35 U/ml to monitor patients’ response to chemotherapy agents as well as for early detection of disease relapse.[12] In endometrial cancer, whether the depth of myometrial invasion exceeds 50% has been established as one of the criteria to differentiate between stage IA and IB tumors.[5]

As classification of biomarkers becomes more and more useful, it has also become increasingly important to employ proper methods for stratification. Although multiple statistical methods are already available, many biomedical studies have not been able to benefit from them. One probable reason is the difficulty to directly apply statistical algorithms on biological data. This study, therefore, may be deemed as an illustration on how to use proper statistical methods to find optimal cut points while at the same time, we also provided a ready-to-use package (supplementary files) for biomedical researchers to use for their own data.

In this study, we analyzed two clinical factors that have both been implicated in risk assessment of endometrial cancer, depth of myometrial invasion and tumor size. Previous studies have found that when the depth of myometrial invasion is less than 50% and the diameter of the tumor is smaller than 2 cm, the patient is at a low risk for lymph nodal involvement.[7, 8, 13, 14] However, to our knowledge, few started with identifying the best split(s), and most only compared the results of several tentative groupings. This study differed from the previous ones in that it was aimed to find the optimal cut points first. Our results echoed the role of myometrial invasion in predicting nodal status; however, tumor size was found not to be an independent risk factor of nodal involvement, yet it was one for overall survival.

The search for optimal cut points in this study was first based on the AIC value in that statistically the smallest AIC gave the best number of cut points. Based on this criterion, we found that the best choice for myometrial invasion was to group it into three risk levels with two cut points at 0.41 and 0.89 in response to lymph nodal involvement. As a matter of fact, 0.41 is quite close to the conventional threshold of 0.5, and in this study, the patients in the group of 0.41–0.89 indeed had a significantly higher risk for lymph node metastasis compared to those with less than 0.41 depth (OR = 2.68). However, our result showed that those with even deeper invasion (> 0.89) had a much higher risk (OR = 18.37). Comparison between stratifying patients with 0.5 only and with 0.41 and 0.89 also showed that the latter had better predictivity. Therefore, we propose that a nearly complete infiltration into myometrium be also included in assessing patients by gynecologists.

This study negated the role of tumor size in predicting nodal status. In fact, there has been some evidence to this end as it has already been suggested before that grade 1 tumors with less than 50% myometrial invasion are at low risk for lymph node metastasis regardless of tumor size.[8] Nevertheless, tumor size was found in this study to be an independent risk factor for overall survival together with FIGO stage and histological subtype. The optimal threshold in this case is around 4 cm, and we also showed that it had better prediction for survival than 2 cm.

On a side note, when applying this method in clinical studies, one may need to balance between the AIC criterion and clinical concerns; if the smallest AIC happens to correspond to a large number of cut points, it may complicate clinical practices, and this is why only 1 to 3 cut points were considered in this study. Therefore, when the smallest AIC value happens to correspond to a large number, one may take the liberty to choose the next best AIC, which may be more practical.

This study also has limitations. The study was a retrospective one and all limitations inherent in the retrospective design could not be avoided in this study. Another limitation is that the study comprised a limited number of patients from one single institution. Due to these limitations, the conclusion here cannot be readily generalized. Future studies will aim at obtaining a larger sample size from multiple institutions, and possibly prospective studies.

We recommend that one start with a statistically sound algorithm to stratify patients into risk groups based on biomarker values, and that both clinical considerations and statistical criteria be taken into consideration for choice of optimal cut points. In the case of endometrial cancer, on top of the conventional criterion of 50% myometrial invasion, a nearly complete invasion should also be included in risk evaluation since the patients may have a much higher risk for lymph nodal metastasis. Tumor size is not an independently significant factor of nodal involvement, but is one for overall survival where the best cut point is approximately 4 cm.

AIC, Akaike information criterion

AUC, area under curve

C-index, concordance statistics

cm, centimeter

FIGO, International Federation of Gynecology and Obstetrics

KSVGH, Kaohsiung Veterans’ General Hospital

Data availability

Data for this study are available upon request.

Ethics approval:

The study was approved by the Institutional Review Board of Kaohsiung Veterans’ General Hospital (KSVGH) and was in accordance with the Declaration of Helsinki. Patients’ written informed content was waived by the IRB.

Consent for publication:

All authors approve the manuscript and agree to publish the study.

Conflict of Interest:

The authors declare no conflict of interest.

Acknowledgement

We thank Mandy Chiang and Coco Chiang for their help with data collection.

Fundings:

The study was supported in part by a grant from the Ministry of Science and Technology of ROC (106-2118-M-110 -002) and two grants from KSVGH (VGHKS 107–143 and VGHNSU 107-001). The IRB waved the written informed consent from the patients for this study.

Authors' contributions:

Study formulation and design: CC, AC, and JC; data collection: AC, LH, PH, and YK; statistical analysis: CC and YC; overall analysis: CC, AJ, and JC; figure preparation: YC; table preparation: YC and JC; writing: JC; editing and checking: CC, AJ, and JC; manuscript approval: all.

Bray F, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Lortet-Tieulent J, et al., International Patterns and Trends in Endometrial Cancer Incidence, 1978–2013. J Natl Cancer Inst, 2017.
Weiderpass E, et al. Trends in corpus uteri cancer mortality in member states of the European Union. Eur J Cancer. 2014;50(9):1675–84.
Creasman WT, et al., Carcinoma of the corpus uteri. FIGO 26th Annual Report on the Results of Treatment in Gynecological Cancer. Int J Gynaecol Obstet, 2006. 95 Suppl 1: p. S105-43.
Pecorelli S. Revised FIGO staging for carcinoma of the vulva, cervix, and endometrium. Int J Gynaecol Obstet. 2009;105(2):103–4.
Morice P, et al. Endometrial cancer. Lancet. 2016;387(10023):1094–108.
Schink JC, et al. Tumor size in endometrial cancer: a prognostic factor for lymph node metastasis. Obstet Gynecol. 1987;70(2):216–9.
Vargas R, et al. Tumor size, depth of invasion, and histologic grade as prognostic factors of lymph node involvement in endometrial cancer: a SEER analysis. Gynecol Oncol. 2014;133(2):216–20.
R Development Core Team, R: A language and environment of statistical computing. 2010, R Foundation for Statistical Computing: Vienna, Austria.
Chang C, Hsieh MK, Chiang AJ, Tsai Y, Liu C, Chen J. Methods for estimating the optimal number and location of cut points in multivariate survival analysis: a statistical solution to the controversial effect of BMI. Comput Statistics. 2019;34(4):26.
Chang C, et al. Determining the optimal number and location of cutoff points with application to data of cervical cancer. PLoS One. 2017;12(4):e0176231.
Bast RC Jr. CA 125 and the detection of recurrent ovarian cancer: a reasonably accurate biomarker for a difficult disease. Cancer. 2010;116(12):2850–3.
Mariani A, et al. Low-risk corpus cancer: is lymphadenectomy or radiotherapy necessary? Am J Obstet Gynecol. 2000;182(6):1506–19.
Schink JC, et al. Tumor size in endometrial cancer. Cancer. 1991;67(11):2791–4.

Download PDF

Version 1

posted

You are reading this latest preprint version

Optimal risk classification with statistical evidence in endometrial cancer

Status:

Version 1

Abstract

Background

Methods

Results

Conclusion

Figures

Background

Methods

Results

Patient information

Optimal cutoff choices for myometrium invasion fraction and tumor size

Independent risk factors for overall survival and lymph node metastasis

Comparison of predictivity with traditional cutoff values

Discussion

Conclusions

List of Abbreviations

Declarations

Data availability

Fundings:

Authors' contributions:

References

Supplementary Files

Status:

Version 1