Deep Next Generation Sequencing Identifies Somatic Mutational Signature in Egyptian Colorectal Cancer Patients


 Background: Colorectal cancer (CRC) incidence is progressively increasing in Egypt. Unfortunately, there is inadequate knowledge of the acquired somatic mutations in Egyptian CRC patients which limit our understanding of its progression. To the best of our knowledge, our study is the first to sequence multiple-gene panel to identify the somatic mutation pattern associated with CRC disease progression in a cohort of Egyptian patients. Custom 72 genes, which are frequently associated with CRC, were sequenced using Qiaseq UMI-based targeted DNA panel in 120 fresh tissues classified into; inflammatory bowel disease (IBD; n=20), colonic polyp (CP; n=38) and CRC (n=62) as well as 20 biopsies with non-specific colitis served as a control group (n=20). Results: Using Ingenuity Variant Analysis (IVA), we revealed that APC, TP53 & ATM genes harbored the highly frequent CRC-specific somatic mutations (15, 11 & 6, respectively). We also identified common somatic mutations (predictors) that were associated with disease progression from colitis to CRC; APC (c.1742delA (65%)), TP53 (c.121delG (58%), c.215C>G (52%)), ATM (c.640delT (16%)), IGF2 (c.677delG (56%)), RET (c.2071G>A (37%)), ACVR2A (c.1310delA (26%)), PIK3CA (c.1173A>G (16%)) & KIT (c.1621A>C (13%)). Furthermore, pathway analysis using Ingenuity Pathway Analysis (IPA) showed that Wnt/βcatenin, ATM signaling, RTK-RAS and TGF-β were the most altered pathways in the CRC group (73%. 72%, 40% & 36%, respectively).Conclusion: In this data set, we shed the light on the most frequent somatic mutations and the most altered pathways that are crucial for understanding colorectal cancer predisposition and developing personalized therapies for the Egyptian CRC patients.


Bioinformatics analysis
The Data were initially analyzed using QIAGEN GeneGlobe Data Analysis Center. The bioinformatics analysis started with Quality Control (QC) checking step by examining the reads of each NGS run and trimming out the reads with low quality. The unaligned bam (UBAM) les were aligned to the human reference genome (version hg19). The run was excluded if the aligned reads was less than 100X depth of coverage, the UMI coverage was lower than 400, and if the target region was covered with less 95% with at least 5% of the mean UMI coverage as explained by Kupe et al. [8].The Catalogue of Somatic Mutations in Cancer (COSMIC) was used to identify pathogenic somatic mutations. Variants that exist in the COSMIC database with MAFs less than 0.5 were retained, whereas variants that didn't exist in the COSMIC database or variants that were agged as SNP were ltrated out. The identi ed variants were classi ed into benign and pathogenic according to ClinVar database [9]. The functional consequences were predicted using Sift [10], PolyPhen-2 [11], and CADD [12] tools. The QIAGEN Ingenuity Pathway Analysis (IPA; QIAGEN) and the Ingenuity Variant Analysis plugin (IVA; QIAGEN) were used for further ltering, annotation as well as interpretation of variants detected in the UMI-based analysis and pathway analysis [8]. Data visualization was performed using R package (version 3.6). The oncoplot and the Lollipop plots were visualized using maftools [13], while the chord diagram was visualized using circlize tools [14].

Statistical analysis
The clinicopathological features of the assessed patients were analyzed using SPSS software package (version 22). Continuous variables were expressed as mean ± SD, and range, while categorical variables were expressed as percentages. Comparisons between groups were analyzed by χ2 test or Fisher's exact test when appropriate for the categorical variables, and by Mann-Whitney test or Student's t-test when appropriate for the continuous variables. P-value was considered signi cant when P-value ≤ 0.05.

Results
Clinical data of the studied participants The patients were classi ed according to age, gender, histological type, grade, recurrence and metastasis (Table 1). Distribution of non-synonymous variants in the studied Groups The variant mean depth of coverage ranged from 500 to 1000X in the all studied groups (Figure S1 a). The heatmap showed the pattern of the nonsynonymous mutations in each group (Figure S1 b).
The frequency of the detected somatic mutations Our results revealed that CRC group has the highest somatic mutation burden compared to the CP, IBD and control groups (168 mutations vs. 124, 88, and 42, respectively). The changes from reference alleles to alternative alleles including transition, transversion, insertion or deletion in each group were illustrated in Fig. 1a.
It has been showed that TP53, APC & ATM genes harbored the most frequently detected somatic mutations (100, 65 & 52 mutations, respectively) in the CRC group while; the TP53, TCERG1 & ATM genes harbored the most frequently detected somatic mutations in the CP group (50, 47& 46 mutations, respectively). Regarding IBD and CP groups; TP53, ATM & TCERG1 genes harbored the most frequently detected somatic mutations in both of them (35, 31 & 21 mutations in the IBD group, respectively) and (20, 11 & 12 mutations in the CP group, respectively). The control group harbored the least frequently detected somatic mutations when compared to the other groups (Fig. 1b).
*Somatic mutations found in other tissues rather than large intestine. **Somatic mutations found previously in the Egyptian CRC patients.
The frequency of the variants detected in the studied groups was illustrated in the chord diagram and the heatmap (Figs. 2a&2b). The diagnostic accuracy of the variants detected in all the studied groups was demonstrated by the random forest plot (Fig. 3). It showed that the variants detected in CRC group had the highest diagnostic accuracy in discriminating the CRC group from the other groups. Also, the CRC model had the lowest error rate compared to the models of the CP, IBD and control groups (0.174 vs. 0.7, 1.0 & 1.0, respectively).

Pathogenic somatic mutations identi ed in the CRC group
The somatic mutational burden per sample in the CRC group was displayed in Figure S2 a, the median was 7 somatic mutations per sample with frame shift deletion predominance. The most predominate single nucleotide variant (SNVs) type was the transition C > T (Figure S2 b). The top 12 highly mutated genes were displayed in the oncoplot (Fig. 4); conceiving that the TP53 and APC were the most frequently mutated genes in CRC group (73% and 69% respectively).
The schematic representations of the APC & TP53 genes at the genetic and protein levels were shown in Figs. 5& 6. We found that exon 16 and exon 4 were the most frequently mutated exons in the APC & TP53 genes, respectively. The β-catenin binding and down regulation sites were the most affected regions in the APC protein, whereas the transactivation and the proline rich sites were the most affected regions in the TP53 protein.
The Loss-of-function and gain of function cancer driver single nucleotide variants (SNVs) detected in the CRC patients compared to the catalogued SNVs in COSMIC database were summarized in Figs. 7&8, respectively.
Our data showed that the APC gene had 15 pathogenic somatic mutations in the CRC group only. They were categorized into 9 frame shift deletions  (Table 3). Rank indicates affected exon/ total exons of the speci ed gene.
*Somatic mutations found in other tissues rather than the large intestine.  (Table 4).  Rank indicates affected exon/ total exons of the speci ed gene.  (Table 6). Rank indicates affected exon/ total exons of the speci ed gene.
**Somatic mutations found previously in the Egyptian CRC patients.

The most commonly altered pathways in the CRC patients
The Ingenuity Variant Analysis (IVA) revealed that the following pathways were commonly altered in the CRC group; Wnt/βcatenin pathway was up-regulated in 73%, ATM signaling pathway was down regulated in 72%, RTK-RAS pathway was up regulated in 40%, and TGF-β pathway was down regulated in 36%. The most commonly altered pathways and their involved genes were illustrated in Fig. 9a. The proportion of the cancer driver genes and the most altered pathways in the CRC patients is illustrated in Fig. 9b. The proposed Egyptian model of the most commonly altered pathways in the Egyptian CRC patients, using the IPA software, was illustrated in Fig. 10.

Discussion
CRC is one of the leading causes of mortality and morbidity world-wide [1]. To the best of our knowledge, our study is the rst to sequence multiple-gene panel to identify the somatic mutation pattern associated with colon cancer disease progression in a cohort of Egyptian patients to help understanding colorectal carcinogenesis process.
In the current study, the somatic mutational burden was higher in the CRC patients when compared to the other groups. The TP53, APC and ATM genes were the most frequently mutated genes in the CRC group. Matching with Cancer Genome Atlas Network, the TP53 and the APC were the most frequently mutated two genes in the CRC patients [15]. So, this nding validates the reliability of our sequencing results.
As for the TP53 which de ned as the 'guardian of the genome'; its alteration is one of the tumor hallmarks and its mutational status is associated with the progression and outcome of sporadic CRC [16]. The TP53 mutation prevalence rate in Arab population is 52.5% in comparison with 47.5% in matched Western population [17].
Our study showed that the TP53 was the 1st rank and highly mutated gene that has been detected in 73% of the CRC patients; indicating its role in the transition from an adenoma to carcinoma [18].
Eleven mutations out 13 were pathogenic with identi ed loss of function and have been detected only in the CRC patients. Interestingly, the most affected exon was exon4 as well as the most frequent TP53 mutations (c.121delG (58%), c.215C > G (52%)) were located in exon4. So, sequencing of TP53 exon4 could be used for CRC prediction in our population.
Matching with a recent study by Kassem et al. [19] on the Egyptian CRC patients, we found ve TP53 somatic mutations; c.1024C > T, c.844C > T, c.743G > A, c.524G > A and 215C > G in our CRC group which might reveal that such variants are Egyptian speci c and explain their contribution in colon cancer disease progression as a driver mutation in tumor development.
Of interest, the TP53 drug response variant (c.215C > G) [20], was detected in more than half of our CRC group. Also, it has been recently observed in 17% of the Egyptian breast cancer patients [21]. Thus, this variant might serve as an e cient predictive marker for chemotherapy response in the Egyptian cancer patients.
Our pathway analysis revealed that the P53 signaling pathway was inactivated in 72% of the CRC patient due to either loss of the TP53 wild type or oncogenic gain of the TP53 mutant. Several new therapies speci cally target p53-mutant cells, while others correct the p53 mutations directly or restore the integrity of the p53 pathway [22]. Therefore, the de cient p53 signaling pathway in our CRC group may arise as an attractive therapeutic target.
Mutation of the APC gene, a multi-functional tumor-suppressor gene, is an early event in the development of CRC and result in activation of Wnt/β-catenin signaling pathway [23]. Mutant APC, Axin2 along with AMER1 (APC-recruitment protein) disrupt the formation of β-catenin destruction complex leading to stabilization and accumulation of β-catenin protein which in turn induces overactivation Wnt/β-catenin signaling and promote the proliferation, invasion and metastasis of cancerous cells [24,25].
We have found that the APC gene was the 2nd rank highly mutated gene in 69% of the CRC patients. Fifteen mutations out of 17 were pathogenic somatic mutations with identi ed loss of function detected only in the CRC group. Interestingly, exon 16 was the most affected exon and it was found to harbor 14 out of 15 detected mutations in the CRC group. Moreover, the most frequently detected mutation in the APC gene that found to be associated with disease progression (c.3754delT (65%)) was also in exon 16. So, sequencing of exon 16 could be used as genetic test assay for CRC diagnosis. Of interest, most of our identi ed somatic mutations were located in the β-catenin and downregualtion site.
According to COSMIC database, we detected two APC variants (c.4588G > T (2%) and c.5288delA (2%)) that were previously reported to be found respectively in pancreatic cancer and hepatocellular carcinoma. Interestingly, this is the rst study to report the presence of such variants in the CRC and to report their involvement in the colorectal carcinogenesis process. However, further studies are needed to validate our ndings.
We also detected three AXIN2 mutations in 27% of the CRC patients. Two out of them (c.2347G > T & c.1975C > T) were reported in 2% and 5% respectively in the CRC group only. It was previously reported by Imielinski et al. and George et al. that these two mutations are associated with small cell lung cancer [26,27].
So, this study is the rst to report their association with CRC.
In this study and according to the IPA software, we found that Wnt/βcatenin pathway was upregulated in 73% of the CRC patients; revealing that Wnt/βcatenin pathway plays a major role in sporadic colorectal carcinogenesis and therefore it is an attractive target for therapeutic intervention [28].
Somatic mutations of the ATM gene, as a DNA repair gene, occur in many tumor types including colorectal cancer. In the colorectal cancer, loss of ATM protein expression is associated with worse prognosis [29]. So, we are in need for such targeted sequencing studies to help monitoring the prognosis in Egyptian CRC patients.
Our data revealed that the ATM has been mutated in 44% of the CRC patients. We detected 9 somatic mutations in the ATM gene, 6 out of them were detected only in the CRC group. The identi ed ATM mutations were previously reported to be associated with CRC (Zehir et al., 2017). Also, we detected two other ATM polymorphisms (c.9007A > G & c.8138G > A), however they were reported to be associated with NHL lymphoma [30].
Moreover, this study showed that the ATM signaling pathway was downregulated in 72% of the CRC patients; revealing the critical role of the defective DNA repair mechanism in colorectal carcinogenesis process [31].
Nowadays, novel therapies have been developed to selectively target patients with ATM-de cient cancers. Those therapies induce synthetic lethality due to lacking e cient repair mechanism such as platinum drugs [32]. Thus, the ATM mutational status could be used to help in the clinical decision-making for those patients along with the development of speci c targeted strategies [33]. Thus, it is important to conduct targeted sequencing studies on the Egyptian CRC patients to evaluate drug e cacy and treatment protocols.
Matching with two studies that reported the association of SMAD4 mutations with the CRC, we detected four somatic mutations (c.692delG, c.1064A > G, c.1081C > T & c.1088G > A) only in the CRC group [34]. The SMAD4 gene acts as an intracellular mediator of TGF-β superfamily signals. TGF-β/SMAD4 signaling maintains DNA damage response (DDR) and DNA damage repair [35]. In this study, the TGF beta pathway was downregulated in 36% of the CRC patients. It was suggested that loss or downregulation of the SMAD4 promotes malignant progression via acquiring resistance to TGF-β superfamily growth inhibition [36]. Moreover, its loss shifts TGF-β signaling pathway to a tumor promoter instead of a tumor suppressor [37]. Our ndings revealed that the SMAD4 mutations had prominent role in colorectal carcinogenesis. Isaksson-Mettavainio et al. reported that loss of the SMAD4 occurs in the CRC in frequencies ranging from 9 to 67% [38]. Moreover, the SMAD4 loss was also associated with worse clinical outcome and resistance to uoropyrimidine-based chemotherapy [39]; implicating its use as a prognostic marker in the CRC patients [40]. Thus, we propose that the Egyptian CRC patients carrying SMAD4 mutations may not bene t from uoropyrimidine-based treatment.
Functional loss of the putative tumor suppressor EP300 gene has been previously observed in gastric, breast, pancreatic, and colorectal cancers. Also, Gayther et al reported a great relevance of the EP300 loss in the colorectal carcinogenesis [41]. Our study found that the EP300 gene harbored 3 somatic mutations.
Out of them, one frameshift mutation (c.832delA) was found to be associated with CRC [42], while the other ones, the splice donor & missense mutations (c.1058G > A &c.3671 + 1G > A), were respectively detected in breast and gastric cancers [43,44]. Moreover, Huh et al. reported that p300 overexpression was an indicator of good prognosis in the CRC patients [45]. Therefore, the identi ed somatic mutations in the EP300 gene might serve as predictor of bad prognosis in the Egyptian CRC patients.
One of the most frequently detected somatic mutations in the CRC is in the tumor suppressor FBXW7 gene. Loss of the FBXW7 was reported to promote epithelial-mesenchymal transition (EMT) and metastasis in the CRC cells [46]. The present study reported two somatic mutations with functional loss in FBXW7 gene (c.2001delG & c.4900A > G). These mutations were previously reported to be found mainly in the CRC patients [47,34]. Moreover, a recent study reported the presence of an association between the FBXW7 mutations and resistance to anti-epidermal growth factor receptor (EGFR) immunotherapy treatment that commonly used to manage metastatic CRC [48]. So, identifying the mutational status of the FBXW7 gene in Egyptian CRC patients may serve as a good diagnostic biomarker to determine the appropriate individualized therapy [49].
The functional loss of the tumor suppressor ARIDA1 gene has been previously reported as a frequent event in the colorectal carcinogenesis [50]. In agreement with a previous study by Erfani et al. who reported a relatively high mutation rate of the ARID1A in the CRC ranging from 10% up to 40%, we also reported a high mutation rate of the ARIDA1 around 29% in our CRC group [51]. Matching with a previous study which showed that the ARIDA1 mutations were more likely to be frameshift or nonsense, we detected 5 framshift mutations in the ARIDA1 [50]. Additionally, four mutations out of them were detected only in the CRC group. Thus, our study showed the prominent role of the ARIDA1 gene in the colorectal carcinogenesis.
Matching with several previous studies [52][53][54], we detected many gain of function somatic mutations that have been previously reported to be commonly Additionally, other reports revealed that these mutations contribute to the acquired resistance to the anti-EGFR therapy [55,56]. Thus, genetic testing of those genes provides bene cial information that help in the clinical management of the CRC patients.
Regarding the common somatic mutations detected in all the studied groups; we have found that nine out of twenty four somatic mutations were the most frequent (c.1310delA in ACVR2A), (c.640delT in APC), (c.5557G > A in ATM), (c.677delG in IGF2), (c.1621A > C in KIT), (c.1173A > G in PIK3CA), (c.2071G > A in RET), & (c.121delG, c.215C > G in TP53). The frequency of those mutations increased from the colitis to nally the CRC, so they could be used as predictors for disease progression. Most of the above identi ed somatic mutations were observed mostly with the CRC [47], except for (c.640delT in ATM), (c.1621A > C in KIT), (c.2071G > A in RET) & (c.121delG in TP53) which were observed respectively in breast [57], soft tissues [58] & head and neck cancers [59]. This is the rst study to report their association with the colorectal carcinogenesis in Egyptian CRC patients.

Conclusion
Our data showed that the genetic makeup of the Egyptian CRC patients is different from other population. Also, the identi ed somatic mutations are crucial for understanding cancer predisposition and developing personalized therapies for the Egyptian colorectal cancer patients.

Recommendations
We recommend; 1) Patients follow up for response to treatment based on our genetic analysis, 2) Microsatellite instability sequencing analysis at different CRC stages, 3) Post-translational research at the protein level for our targeted genes.

Declarations
Ethics approval and consent to participate The study protocol was approved by the Institutional Review Board (IRB number: IRB00004025) of the National Cancer Institute (NCI), Cairo University, Egypt.
The study was conducted in accordance with the ICH-GCP guidelines (approval number: 201617011.3). A written informed consent was obtained from each patient enrolled in the study.

Figure 1
Bar charts showing a) the changes of the reference alleles to the alternative ones in each studied group, b) the highly mutated genes with classi cation in each studied group.

Figure 3
The random forest plot of the detected non-synonymous variants showing the diagnostic error rate of each model in the studied groups. The CRC model has the lowest error rate (0.2) compared to the other groups. Figure 4