Assembling High-Quality Lymph Node Clinical Target Volumes for Cervical Cancer Radiotherapy Using a Deep Learning-Based Approach

Background: To explore an approach for accurate assembling high-quality lymph node clinical target volumes (CTV) on CT images in cervical cancer radiotherapy with the encoder-decoder 3D network. Methods: CT images from 216 cases were involved from 2017-2020 in our center. 216 patients were divided into two cohorts, including 152 cases and 64 cases respectively. The �rst cohort with 152 cases whose para-aortic lymph node, common iliac, external iliac, internal iliac, obturator, presacral and groin nodal regions as sub-CTV were delineated manually. Then the 152 cases were randomly divided into training ( n=96 ), validation ( n=36 ) and test ( n=20 ) groups for training process. Each structure was individually trained and optimized through a deep learning model. An additional 64 cases with 6 different clinical conditions were taken as examples to verify the feasibility of CTV generation based on our model. Dice similarity coe�cient(DSC) and hausdurff distance(HD) metrics were both used for quantitative evaluation. Results: Comparing auto-segmentation results to ground truth, the mean DSC value/HD were 0.838/7.7mm, 0.853/4.7mm, 0.855/4.7mm, 0.844/4.7mm, 0.784/5.2mm, 0.826/4.8mm and 0.874/4.8mm for CTV_PAN, CTV_common iliac, CTV_internal iliac, CTV_external iliac, CTV_obturator, CTV_presacral and CTV_groin, respectively. The similarity comparison results of 6 different clinical situations were 0.877/4.4mm, 0.879/4.6mm, 0.881/4.2mm, 0.882/4.3mm, 0.872/6.0mm and 0.875/4.9mm for DSC value/ HD respectively. Conclusions: We developed a deep learning-based approach to segmenting lymph node sub-regions automatically and assembling CTVs according to clinical needs with these sub-regions in cervical cancer radiotherapy. This work can


Background
Cervical cancer is the fourth most common cancer in women worldwide and continues to be a major threaten to public health [1].Radiation therapy is the primary clinical treatment for patients with locally advanced, advanced cervical cancer, and surgery intolerant, as well as the adjuvant treatment after radical hysterectomy [2,3].The use of intensity modulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) techniques can ensure the conformal prescription dose distribution to targets while protecting adjacent normal tissues [4].For IMRT and VMAT to cervical cancer, accurate delineation of clinical target volumes (CTVs) and organs at risk (OARs) is essential for guaranteeing tumor control and minimizing radiation toxicities to normal tissues [5].However, manual delineation of targets and OARs on medical images is a labor-consuming and repeatable task for radiotherapy physicians [6].Besides, due to different levels of experience and preference of radiation oncologists, interand intra-observer variabilities are inevitable in delineation tasks [7,8].For cervical cancer external irradiation, although several consensus guidelines have been published for establishing delineation standard as well as reducing variabilities [9][10][11][12][13], delineation disagreements still exist especially in pelvic lymph node CTV because of its unclear boundaries and variable shapes between different patients.
In recent years, automatic segmentation for OARs and CTV has been con rmed as a common approach to improve oncologists' work e ciency, consistency and normality [14].Until now, there are two methods, atlas-based and deep learning (DL) based, to carry on this kind of tasks.Clinically acceptable segmentations of OARs and target volumes based on atlas-based for various sites have been investigated in the past two decades [5,[15][16][17].The principle of atlas-based method is matching new test images to reference images (or called atlas images) with regions of interest (ROIs) [18].This segmentation performance will be in uenced by deformable image registration accuracy and matched atlases.The limitations and inaccuracies in the segmentation results can be induced speci cally by patient size difference, deformable organs or anatomical variation.
On the other hand, with neural networks, DL has an outstanding performance in medical image analysis, including auto-segmentation.Among numerous networks, the convolutional neural network (CNN) is a standard segmentation network modi ed to minimize memory usage [19].The feasibility of autosegmentation based on CNN for OARs and CTV of cervical cancer has been evaluated in several studies [19][20][21][22].Among them, researchers developed algorithms to auto-delineate CTVs de ned by each department individual naming rules or habits of clinical applications but lacked the ability to autodelineate individual lymph node regions, which could be combined to form CTVs for different diseases in pelvic site.In the research of auto-segmentation of CTV for nasopharyngeal carcinoma (NPC), Cardenas CE' group has achieved auto-segmentation in the individual lymph node levels with high clinical acceptability based on DL [23].For cervical cancer patients, the CTV de nition involves several parts: primary tumor (if exists), upper vagina, parametrial/paravaginal tissue, uterosacral ligaments, pelvic lymph node regions [3].The range of lymph node regions irradiated will depend on the stage and metastasis of the tumor.According to the CTV delineation consensus guidelines for pelvic lymph nodes, we propose an approach with similar idea to Cardenas' group for auto-segmentation.As far as we know, this is the rst attempt to obtain CTV for cervical cancer automatically by assembling multiple lymph node sub-regions which are auto-delineated individually with the model based on DL.The number of subregions can be selected by the radiation oncologist who decides the CTV range according to patient's clinical condition.It is expectable that both exibility and e ciency for CTV auto-delineation in radiotherapy work ow can be improved in this way.

Patient data and data preparation
216 patients with pathologically proven postoperative cervical cancer who received external beam radiotherapy from 2017 to 2020 at our treatment center were involved.In this study, all patients' information was anonymized and only the CT image data and contour data were used for analyses.All patients were immobilized in the supine position using thermoplastic mold and radiotherapy simulation CT images were obtained by Siemens SOMATOM De nition CT (Siemens Healthcare, Forchheim, Germany) or Revolution CT (GE Medical Systems, Madison, USA).Each of the scanning data sets was constructed with a size of 512×512 pixels and a layer thickness of 3mm.The lymph node subregions in the pelvic area usually include para-aortic lymph node (PAN), common iliac, external iliac, internal iliac, obturator, presacral and groin nodal region, which were named CTV_PAN, CTV_common iliac, CTV_external iliac, CTV_internal iliac, CTV_obturator, CTV_presacral and CTV_groin, respectively, in this study.216 patients were divided into two cohorts, including 152 cases and 64 cases respectively.Each lymph node subregion for the 152 patients was contoured manually by a junior oncologist on Raystation (Version 4.7.5, RaySearch Laboratories AB, Stockholm, Sweden), then modi ed (only when needed) and approved by a senior oncologist with more than 15 years of experience.Then the 152 cases were randomly divided into training ( n=96 ), validation ( n=36 ) and test ( n=20 ) groups.Another group containing 64 cases were used to verify the feasibility of clinical application.
We designed an image preprocessing module before DL training which composed of four steps.The CT images were normalized in the rst step.Then the external body structures were segmented from the whole CT images by an OTSU algorithm [24].The center of external body in each image was shifted to the center of the image in the third step.Finally, all images were resized from 512 × 512 × N to 256 × 256 × N, where N represented for the number of slices.

Network architectures for DL and training process
Inspired by the advantages of residual networks [25], we designed an encoder-decoder 3D network with residual blocks (Fig 1 ) for our segmentation task.In the encoder of the residual network and downsampling module, an adjusted stride of the convolution layer was used.On the other hand, the decoder of this network was designed as the combination of a hybrid channel decoder (HC-decoder) and an enhanced channel decoder (EC-decoder).The purpose of the decoder combination was to address the problem of large target volume differences between different cases.In HC-decoder, features were received from the encoder by skip connections.The output features of the upper convolution layer were mixed and were represented by CMi.And an attention network was added into the EC-decoder as well [26].Each attention block was a mixture of the primary branch and the mask branch.The primary branch was composed of two residual blocks, and the output feature was represented by Ri (x).The mask branch adopted the bottom-up and top-down structures, and the output feature was represented by Mi(x).The mask branch was also used as the control gate of the primary branch, denoted as AMi.Formula (1) showed the relationship between them.Finally, the whole information from HC-decoder (CMi) and ECdecoder (AMi) were superimposed and fused to form the output of the DL network.In order to capture context features and preserve spatial information better, the Dense Atrous Convolution (DAC)model and residual multi-kernel pooling (RMP) was used to connect the decoder with the encoder [27].
Separate model was trained for each lymph node sub-region and the model with the highest performance was retained.Adaptive moment estimation was used in the training process.Our network adopted the combination of the cross-entropy loss and dice loss.And the patch sizes were 64, 256 and 256 respectively.The number of epoch was 500 and the learning rate was 0.001 which would be decreased to one-tenth of the last learning rate when the validation loss had not been updated for 20 epochs.The network was implemented by Tensor ow framework (Version 2.1) and the whole training process was operated on a NVIDIA GeForce RTX 2080Ti.

Evaluation metrics
The quantitative evaluation tools on the segmentation results used in this study were Dice similarity coe cient (DSC) and Hausdorff Distance (HD).The DSC is de ned by formula (2): where X and Y are predicted and ground truth segmentation, respectively.The DSC mainly depends on the overlapping area of two areas as a percentage of the total area [28].The DSC calculated by equation is a parameter whose value range is between 0 and 1.The higher the DSC result is, the higher the similarity of two contours being compared is.On the other hand, in order to assess the subtle differences between ground truth and auto-segmentation contours in small regions, HD values have been calculated between contours, too.HD is sensitive to positional difference and de ned as follows: where h(X,Y) is the maximum value of all distances between the nearest point of one point set and another point set [29].In our research the 95%HD value was used to represent the distance between the contours boundaries in the 95th percentile.
As it was shown in the overall owchart (Fig. 2), after con rming the auto-segmentation model for each lymph node sub-region can achieve good performance, the feasibility of assembling overall lymph node CTVs by combing different number of sub-regions was evaluated with the data from another cohort including 64 patients at 6 different clinical situations.For each situation, different sub-regions were chosen to combine to a new CTV.The CTV_LN of each patient was treated as the ground truth contour which was the clinically irradiated target volume to compare the combined CTV named CTV_LNAT.

Results
Table 1 listed the quantitative comparing results between auto-segmentation and ground truth on lymph node sub-regions with the mean (± standard deviation) values.The max HD existed in CTV_PAN and the result re ected on images level was the difference in number of delineation layers.The comparisons on CT images with contours of the lymph node sub-regions were shown in Fig 3, where the red lines were manual delineations by the oncologists and the blue lines were the results segmented by DL network.Table 1.The similarity comparison of auto-segmented contours and manual contours of lymph node subregions.
Usually, a CTV for pelvic lymph node region in different clinical conditions contains different numbers of sub-regions.In this study, the target volumes named CTV_LNs for 64 patients at 6 different clinical situations were selected as the ground truth to compare with the CTVs generated by integrating different numbers of sub-regions.Table 2 listed their comparison results.It could be observed that the mean DSC values for all CTVs were higher than that for sub-regions while the HD values were lower correspondingly.
It could be clearly noticed from the overall boxplots of DSC for all contours in Fig 4 .Table 2.The similarity comparison of contours from integration of auto-segmented lymph node subregions and manual contours of whole lymph node CTVs at 6 different clinical conditions Note CTV1: CTV1 consists of CTV_( internal, external, presacral and obturator) and corresponds to the CTV_LN in those cases with negative lymph nodes; CTV2: CTV2 consists of CTV_(common, internal, external, presacral and obturator) and corresponds to the CTV_LN in those cases with high risk nodes metastasis; CTV3: CTV3 consists of CTV_(internal, external, presacral, obturator and groin) and corresponds to the CTV_LN in those cases with the lower 1/3 of the vagina invasiveness; CTV4: CTV4 consists of CTV_(common, internal, external, presacral, obturator and groin) and corresponds to the CTV_LN for those cases in both condition 2 and 3; CTV5: CTV5 consists of CTV_(PAN, common, internal, external, presacral and obturator) and corresponds to the CTV_LN with common iliac or para-aortic lymph nodes metastases; CTV6: CTV6 consists of CTV_(PAN, common, internal, external, presacral, obturator and groin) and corresponds to the CTV_LN for those cases in both condition 3 and 5.
Though most of the automatically generated contours had good performance, there were still unsuccessful segmentation in some cases.The failure delineation usually occurred around the CT layers of vascular bifurcation since the location of vascular bifurcation was far from the location of the bone landmark structure in some rare cases.Another failure condition was related to the different height of the vascular bifurcation when common iliac vessels bifurcating to the internal and external ones among cases.The poor segmentation regions in CTV_PAN were mainly located in the junction layer between the pan structure and the common structure.
A three-dimensional reconstruction of CTV for one case was shown in Fig 5, where the red lines represent for the clinical certi ed CTV_LN and the blue lines for the CTV_LNAT.

Discusstion
Bene ted from the development of radiation therapy, the pelvic IMRT has been proved to increase cure rates and reduce toxicity in retrospective and prospective researches for pelvic malignancies [30][31][32].Therefore, the accuracy of target volumes delineation is an essential precondition because lesssegmentation in high-risk areas would cause the possibilities of regional recurrence and metastasis, and meanwhile over-segmentation in low-risk would be rough for dosimetrists to reduce dose of OARs.Until now, manual contours are always considered as the gold standard both in OARs and CTVs though it is still a tough and time-consuming task for oncologists.
Over the past two decades, an increasing interest on auto-segmentation for different modalities of medical image has addressed the delineation task greatly.In the eld of radiotherapy, atlas-based autosegmentation and DL-based auto-segmentation are two main approaches.Young AV et al. [33] developed an atlas-based approach for generating nodal volumes for postoperative endometrial cancer which proved that auto-segmentation was time-saving.However, the atlas-based auto-segmentation method can often impacted by intersubjective variability.Sometimes, the inaccuracy caused by those variabilities may cause oncologists to spend more time and effort to modify structures obtained by atlasbased processing.In recent years, DL-based algorithms have been proposed to improve the level of autosegmentation tasks.Rhee DJ et al. [22] used a CNN model which could auto-segment whole pelvic lymph nodes CTV and para-aortic lymph nodes CTV with the DSC result of 0.81 and 0.76, respectively.Liu Z's team used another CNN to get the DSC results of 0.86 in fully delineating of cervical cancer CTV [20].Wang Z et al. [21] got the similar result with Liu Z by training a DL model and they concluded that the auto-segmentation model had better e ciency in learning than oncologists within much experience.Similar to the idea of Cardenas CE' team, who developed a model that can provide a variety of target volumes for the same head and neck cancer patient [23], we proposed a method that can provide different lymph node level coverage with high exibility.In this study, instead of building a single DLbased automatic segmentation model for CTV of cervical cancer, we developed multiple models for each lymph node sub-region.The radiation oncologist can choose multiple lymph node sub-regions to assemble a whole node CTV according to the different clinical conditions.The quantitative evaluation on consistency of lymph nodes CTV obtained from sub-regions with clinical used contours showed both exibility and e ciency of segmenting CTV could be improved in clinical delineation work.
Although our model was based on cases of cervical cancer, it would be feasible for part of structures to extrapolate the results to other gynecological cancers or even male pelvic anatomy.For cervical cancer or other gynecological cancers, the lymph nodes CTV may typically include the internal iliac, external iliac, obturator and presacral lymph nodes region.Treatment including common iliac region for cases with tumors involving the cervix [34], large tumor and if there is lymphadenopathy suspected or con rmed in low true pelvic [3].Some experts believe that only when tumor extend along the uterosacral ligaments or rectal involvements are proved, the presacral nodes CTV should be included [34].The latest guidelines suggest the presacral CTV should be included in CTV for all postoperative cervical cancer patients and endometrial cancer patients with cervical stromal invasion [11].When the lower third of vagina is invaded, the groin nodes CTV should be treated as well.Para-aortic lymph nodes CTV is mainly the irradiation target for extension eld for those cases with pelvic nodes or para-aortic lymph nodes metastasis and it has been con rmed to be an effective improvement [35].The existence of different degrees of disease progression means that oncologists will choose which sub-regions to be involved in the CTV based on their own clinical experience.Although the probability of encountering each condition is not same in clinical, our research provides optional lymph nodes CTV region for wide variety of possibilities.
The CTV_obturator got the worst performance in auto-segmentation whose mean DSC was only 0.784±0.036.This may be related to the delineation de nition of CTV_obturator.It is de ned as a connection between the CTV_internal iliac and CTV_ external iliac and avoiding bones, muscles and bladder.Thus, in uenced by the nearby anatomical structures and different lling sizes of bladder, consideration about if an extra internal target volume (ITV) should be added into CTV_obturator is necessary.Besides, the DSC of an assembled CTV can gain a 3 to 5 percentage increase comparing with the individual structures.Some pelvic lymph node regions (CTV_common, CTV_external iliac and CTV_internal iliac) are coherent along with the vessels, the DSC value of CTV will be higher when those structures are linked.That is, the DL model lack the ability at recognizing all layers with structures and when the auto-segmentation results generating from DL network missed some layers, the linked structures nearby would ll the gaps at these layers.Consequently, using the ensemble approach in this study, the nal CTV generalized by combination of multiple sub-regions can obtain good performance when compared with ground truth contours.
Automating the delineation of pelvic lymph node regions which can assemble CTV has several potential bene ts.First, if automatically generated target volume delineations are consistent and accurate, it could lead to the standardization of target volume delineations, which are among the largest sources of uncertainty in radiation treatment planning.Second, consistent, systematic target volume delineations using automated models could lead to increased quality of clinical data.Third, if an auto-segmentation model is clinically implemented, the radiation oncologist only need to identify which lymph node regions should be contoured and review the result of auto-segmentation.That will be effective in reducing the workload at busy clinics with limited resources.
A few limitations still exist in our study.Currently, all datasets have been collected from a single department.Future multi-centers studies are necessary to cover more datasets to verify and improve our model.Second, our study only focused on the lymph nodes CTV in postoperative cervical cancer cases.
Our model may also have the potential to be used for training other CTVs such as vaginal CTV, primary CTV and parauterine tissues in order to facilitate fully automated radiotherapy target volumes for cervical cancer both in postoperative cases and radical therapy cases.We will continue our research and put some of the lymph nodes CTV into use of male pelvic cancers to further demonstrate our model's universality.

Conclusion
In conclusion, we developed multiple DL-based models to auto-segment lymph node sub-regions which could assemble CTVs according to clinical conditions in cervical cancer radiotherapy.The good performance of auto-segmentation proves the approach has application exibility and versatility.This work can be applied to improve the consistency and accuracy of CTV delineation, increase the e ciency of cervical cancer work process.

Figures
Page 14/17 The architecture of the encoder-decoder 3D network Overall owchart of the experiment process for generating lymph nodes CTV  The boxplots of DSC value between auto-segmentation contours and ground truth of 7 individual structures and 6 different clinical situations.