The circadian clock circuitry deconvolutes colorectal cancer and lung adenocarcinoma heterogeneity in a dynamic time-related framework

Increasing evidence imputes cancer progression and resistance to therapy to intra-tumor molecular heterogeneity set off by cancer cell plasticity. Re-activation of developmental programs strictly linked to epithelial-to-mesenchymal transition and gaining of stem cells properties are crucial in this setting. Many biological processes involved in cancer onset and progression show rhythmic fluctuations driven by the circadian clock circuitry. Novel cancer patient stratification tools taking into account the temporal dimension of these biological processes are definitely needed. Lung cancer and colorectal cancer (CRC) are the leading causes of cancer death worldwide. Here, by developing an innovative computational approach we named Phase-Finder, we show that the molecular heterogeneity characterizing the two deadliest cancers, CRC and lung adenocarcinoma (LUAD), rather than a merely stochastic event is the readout of specific cancer molecular states which correlate with time-qualified patterns of gene expression. We performed time-course transcriptome analysis of CRC and LUAD cell lines and upon computing circadian genes expression-based correlation matrices we derived pseudo-time points to infer time-qualified patterns in the transcriptomic analysis of real-world data (RWD) from large cohorts of CRC and LUAD patients. Our temporal classification of CRC and LUAD cohorts was able to effectively render time-specific patterns in cancer phenotype switching determining dynamical distribution of molecular subtypes impacting patient prognosis.


INTRODUCTION
Most of the biological processes crucially implicated in carcinogenesis when altered are characterized by rhythmic fluctuations driven by the circadian clock circuitry and in turn, oncogenic processes openly thwart and disrupt the biological clock.The derangement of the molecular clockworks and the disorganization of the circadian timing system contribute to the onset and development of various types of cancer by altering several cancer related pathways [1,2].Notably, disruption of circadian rhythmicity, which impacted both the transcriptome and metabolome of lung cells, was shown to be at the basis of lung tumorigenesis by means of genetically engineered mouse models with mutation of genes of the core clock network (CCN) challenged with simulated jet lag [3].Likewise, CCN genetic disruption and environmental disturbance of circadian rhythmicity promoted colorectal carcinogenesis in a murine model susceptible to adenoma development [4].
Comprehensive genomic profiling of human cancer is unveiling the vast molecular heterogeneity of solid tumors, which nowadays appears to be the major hurdle for the identification of effective therapeutic strategies for neoplastic disease.What we learned so far is that beyond inter-samples heterogeneity each tumor sample is dominated by high intratumor heterogeneity (ITH), which supports the evolution of aggressive cancer cell subpopulations that drive metastatic spreading and drug resistance [5][6][7].Furthermore, analyses of human cancer transcriptomes unveiled the existence of molecular subtypes, which in part recapitulate such heterogeneity, in that specific tumor subtypes are characterized by peculiar genetic and epigenetic signatures and clinicalpathological characteristics [8,9].However, this rather static interpretation of tumor molecular subtypes described like distinct entities has been recently challenged by single cell -omics studies performed in colorectal cancer (CRC) that described the coexistence of different CRC molecular subtypes (CMS) in the same tumor tissue [10].In addition, molecular profiling of circulating tumor DNA in liquid biopsy recently showed a dynamic pattern of cancer molecular features in terms of fluctuation of anti-EGFR mutations, during longitudinal studies of patients treated with anti-EGFR therapy [11].Such evidence was not limited to CRC but did also apply to other tumor types [12].
Cancer cell plasticity, which is driven by mechanisms involved in epithelial-to-mesenchymal transition (EMT) and cancer cell stemness, can be at the basis of dynamic molecular heterogeneity as recently suggested by independent studies [10,13,14].Interestingly, we recently found that loss of expression of the circadian gene TIMELESS triggers an epigenetic remodeling of CRC cells under the control of ZEB1, a canonical EMT master regulator, which dictates the acquirement of metastatic and stem-like phenotypes [15].Besides, we discovered an aggressive molecular subtype in lung adenocarcinoma characterized by high-cell plasticity and immunoevasion phenotypes [16].Yet, another circadian gene, namely ARNTL2, was recently found to drive metastatic spreading in lung adenocarcinoma (LUAD) [17].Taken together, these evidences suggest that IHT, rather than a merely stochastic event, could be the readout of specific cell plasticity states which correlate with time-qualified patterns of gene expression, ultimately determining the hierarchy of cancer molecular subtypes and patient prognosis.
Here, we tackled this issue ordering in a dynamic fashion singleshot tumor sampling through pseudo-time points resulting from cancer cell lines time-series gene expression profiling to infer time-qualified patterns in the transcriptomic analysis of real-world data (RWD) from large cohorts of CRC and LUAD patients.We show, for the first time, that tumor molecular heterogeneity and subtypes do not necessarily follow a rigid topological organization, but rather is endowed with a dynamic time-related framework managed by the circadian clock circuitry.

TCGA-COAD and TCGA-LUAD pseudo-time classification analysis
CRC and LUAD samples were classified according to pseudo-time points by performing correlation analysis among CCN genes/TIMELESS/ZEB1 expression profiles in SW480 or PC9 cell lines time-series transcriptome analysis (used as time-qualified 'reference') and the CCN genes/TIMELESS/ZEB1 expression profiles in TCGA-COAD or TCGA-LUAD samples.Briefly, Z-score values of the CCN genes/TIMELESS/ZEB1 expression were calculated at each time point of the SW480 and PC9 time-series transcriptome analysis as well as in samples of the TCGA-COAD/TCGA-LUAD cohorts.Next, we calculated Pearson correlation (?) of CCN genes/TIMELESS/ZEB1 z-score values comparing TCGA-COAD/TCGA-LUAD samples with each time point of the SW480 or PC9 time series.Each TCGA sample was then assigned to a specific time point according to the highest positive ?value.Lastly, we summarized the expression values of each gene of the 'pseudo-time point' classified TCGA samples using median gene expression in order to obtain a correlation matrix with one expression value for each gene in a specific time point.

Circadian genes identification analysis
To detect transcripts exhibiting rhythmic pattern of expression with circadian (24 ± 4 h) periodicity, the MetaCycle R package [22] was applied to normalized gene expression data for cell lines and to median normalized gene expression data for TCGA-COAD-time/TCGA-LUAD-time classified patients' samples.The minimum period length was set to 20 h and the maximum period length to 28 h.Transcripts with significant rhythms (p-value < 0.05) were selected for further analyses (Supplementary Table S1).Amplitude fold change (FC) was calculated as previously explained [18]: (AmpFC = (1 + Amprel)/(1 − Amprel)), where Amprel is the relative amplitude reported in MetaCycle output table.

Phase set enrichment analysis
Circadian-related pathways were determined by Phase Set Enrichment Analysis (PSEA) [23] based on the statistically significant circadian transcripts identified by MetaCycle.Nine gene sets were either manually selected from the Molecular Signatures database (MSigDB) gene sets collections [24] or custom-made by literature search.Genes involved in proliferation were annotated by the GO term GO:0008283.Acrophases of all circadian transcripts (rounded to the full hour) were compared to a uniform background distribution and statistically significant circadianrelated gene sets were identified (q-value < 0.25).

Colon Molecular Subtype (CMS) classification
CMSclassifier R package [9] was used to classify each time point of the time-course microarray experiments and each TCGA-COAD sample.For TCGA-COAD patients, log 2 normalized gene expression data were used.

RESULTS
We conducted time-series transcriptome analysis of CRC cell lines (SW480 and SW620), derived from matched primary and metastatic (lymph node) lesions of the same patient [18].Further than the whole transcriptome we focused and investigated the expression profile of circadian genes (N = 15), representative of CCN (Supplementary Fig. S1A).Besides, we analyzed TIMELESS (aka TIM), another circadian gene, and ZEB1, an EMT master regulator which we recently showed to be modulated by TIM [15] (Supplementary Fig. S1A).As previously reported [25], all CCN genes were expressed in both CRC cell lines with peculiar oscillation patterns, but with decreased fluctuation amplitude in SW620 cells (metastatic cell line) compared to SW480 cells (primary cell line) (Supplementary Fig. S1A).Overall, we found 14% (N = 3666, genes) of the SW480 transcriptome with a circadian pattern of expression profile, while 8% (N = 2126, genes) of SW620 transcriptome showed a circadian pattern of expression (Supplementary Table S1).In SW480 cell line, we identified the highest peak of gene expression corresponding to 9 h after synchronization and 52% of transcripts peaking between 6 h and 12 h (Supplementary Fig. S1B, upper panel).In SW620 cell line, we observed a similar trend with the highest peak of gene expression corresponding to 9 h after synchronization and 53% of transcripts peaking between 6 h and 12 h, but with a lower number of circadian transcripts compared to SW480 cell line (Supplementary Fig. S1B, lower panel).Notwithstanding a lower number of circadian transcripts found in SW620 cell line, the expression amplitudes were comparable to SW480 cell line (Supplementary Fig. S1C).Intriguingly, we observed an antiphasic expression pattern of TIM and ZEB1, with TIM expression acro-phase around 15-18 h time point, and ZEB1 expression acro-phase around 3 h time point (Supplementary Fig. S1A) in both CRC cell lines.As we previously showed, loss of TIM expression induces EMT/stemness by upregulating ZEB1 while TIM upregulation correlates with positive regulation of cell proliferation/DNA repair, all relevant mechanisms for CRC progression [15].We thus investigated whether the transcriptional pattern of genes enriched in such mechanisms were also featured by circadian rhythmicity.We applied Phase Set Enrichment Analysis (PSEA) [26] and observed an enrichment of pathways involved in tumor cell plasticity and stemness (e.g., Wnt/Notch/Hippo signaling pathways, and EMT) particularly in the first 12 h of the time series (q < 0.25; Fig. 1A, B; Supplementary Table S2), when ZEB1 expression reached the maximum (i.e., 3 h; Fig. 1A, B).Contrariwise, we scored the enrichment of signaling pathways controlling cell proliferation, such as G2M checkpoint, and DNA damage repair (q < 0.25; Fig. 1A, B; Supplementary Table S2), in the second half of the 24-hour time series, when TIM reached its maximal expression (i.e., 15-18 h; Fig. 1A, B).
We then asked how these findings could actually impact the molecular pathology of CRC.To do this, we developed a methodology, namely Phase-Finder, to render the human CRC transcriptome through a more dynamic scenario by remapping 'static' human CRC transcriptome to pseudo time-series.Briefly, we used Pearson product-moment correlation (?) of z-score values obtained from CCN genes/TIM/ZEB1 expression in time-series microarray analysis of human CRC cell lines to classify RWD of CRC transcriptomes (i.e., TCGA-COAD cohort; N = 436; see methods) according to their highest expression correlation with timequalified points of CRC cell lines (Fig. 1C; see methods).We selected the SW480 cell lines to be used as reference because the Pseudo-Time metastatic SW620 cell line showed a stronger dysregulation of the molecular clockwork, as previously reported [18].
Remarkably, we observed a coherent CCN expression profile in the reclassified CRC samples when compared with CRC cell lines time-course experiment (Fig. 1D).In keeping with this, when we applied PSEA to pseudo time-series of RWD CRC transcriptomes, once more we observed enrichment of WNT signaling pathway which is involved in cancer cell plasticity and stemness acquisition [27] followed (after 12 h) by mechanisms relevant for cell proliferation (q < 0.25; Fig. 1E; Supplementary Table S2).Again, the expression of ZEB1 peaked at 3 h while TIM expression peaked at 15 h, coherently with the type of enriched mechanism (Fig. 1D).
We then asked if the intrinsic molecular subtypes (CMS 1-4) which accurately describe inter-tumor heterogeneity and ITH in CRC might follow a circadian pattern of distribution.Interestingly, the CMS4, corresponding to the mesenchymal subtype (EMT high, TGFβ high, matrix remodeling) was significantly enriched at 3 h pseudo-time point along with high-ZEB1 and low-TIM expression (Fig. 1F).Contrariwise, the CMS1 subtype (MSI-high, BRAF mutation, proliferation) was enriched at 9 h pseudo-time point (Fig. 1F), along with initial increase of TIM expression (Fig. 1E) followed by the CMS3 (i.e., metabolic subtype) which was enriched at 18 h pseudo-time point (Fig. 1F).Of note, CMS1 distribution appeared to be in anti-phase with CMS4 distribution thus recapitulating TIM/ZEB1 anti-regulation (Fig. 1F).Other genetic characteristics of CRC appeared to follow a circadian patten such as microsatellite status (Fig. 1F, lower panel) and tumor mutational burden (TMB) (Supplementary Fig. S2A), with a trend that recapitulated the acro-phases found in time-series transcriptome analysis of colon cancer cell lines (Supplementary Fig. S1C).
We then investigated whether the molecular features of LUAD, another well-characterized tumor type both in terms of molecular subtyping [8,16,28] and cell plasticity [13], follow a circadian pattern as well with the aim to further validate in silico our results.As also recently showed by our group, LUAD can be classified in two main molecular subtypes (C1 and nonC1) that are characterized by distinct epigenetic and genetic characteristics, cancer cell phenotypes, and cell plasticity states [16].First, we interrogated GEO database and found a time-series transcriptome profiling of PC9 LUAD cell line that we used as reference to remap static RWD of LUAD transcriptomes to pseudo time-series [19].Then, we analyzed the TCGA-LUAD dataset (TCGA-LUAD cohort; N = 515; see methods) and reclassified RWD LUAD transcriptomes to realize a proxy time-series similarly to what we performed for CRC samples, in order to recapitulate a 24-h time-course.Again, we observed a CCN genes expression profile of the pseudo timeseries of LUAD samples coherent with the profile of PC9 cells timeseries (Fig. 2A).Importantly, when we applied PSEA we found similar enrichment of signaling pathways as in CRC analysis, with cancer cell plasticity followed by cell proliferation/DNA damage repair mechanisms, once again correlating with TIM/ZEB1 axis regulation (q < 0.25; Fig. 2B; Supplementary Table S2).Finally, we compared the distribution of C1 (high-cell plasticity status, more aggressive) and nonC1 (low-cell plasticity state, less aggressive) molecular subtypes and observed high correlation with ARNTL2 expression profile (? = 0.84; Fig. 2C).ARNTL2 is a paralog of ARNTL and forms a heterodimer with CLOCK which then binds E-box elements and induces target genes transcription.Interestingly, ARNTL2 was recently described to drive LUAD metastatic progression by favoring the acquisition of stem cell traits in LUAD cells [17].In keeping with this, prognosis of LUAD patients segregated between 3-12 h pseudo-time interval, where the expression of ARNTL2 is higher (Fig. 2D), significantly associated to worst outcome (Fig. 2E).Surprisingly, when we used the RWD CRC transcriptomes, the prognosis of patients segregated between 3-12 h pseudo-time interval was significantly associated to better outcome (Fig. 2F) regardless the expression pattern of ARNTL2 gene (Fig. 2D).As a matter of fact, we previously showed that lowexpression of TIMELESS unleashes ZEB1 expression, which drives the acquirement of metastatic phenotypes in CRC [15].
Taken together, these findings reveal how the functioning of the circadian clock circuitry impinges on cancer-relevant pathways, which correlate with tumor progression, suggesting also potential therapeutic options targeting CCN genes expression oscillation.

DISCUSSION
Recent evidences challenged our actual knowledge that tumor molecular heterogeneity is a stochastic event.In addition, recent studies suggested that ITH depends, at least in part, on the activation of mechanisms inducing cell plasticity states that influence the molecular architecture of cancer tissue and subtypes composition.Based on previous findings which link the circadian clock circuitry with the modulation of mechanisms involved in cancer progression and tumor cell plasticity [15,29,30], here we showed that molecular heterogeneity of the two most deadly cancer types (i.e., CRC and LUAD) can be re-shaped as a result of time-ordering in accordance with inherent oscillations of biological processes driven by the molecular clockwork.
Fig. 1 Phase Set Enrichment Analysis (PSEA) and Phase-Finder analysis of CRC transcriptomes.A, B TIMELESS and ZEB1 expression profile in time-course analysis of SW480 and SW620 cell lines, together with results of the Phase Set Enrichment Analysis (PSEA).Each circle represents the z-scores values of TIMLESS and ZEB1 expression and are as per the legend.The inner the circle the lower the z-score.Color codes are as per the legend.The type size of enriched pathways by PSEA is proportional to the "vector-average magnitude" which is an indicator of the cohesiveness of the genes of the pathway at the time band i.e. the greater the uniformity of the times (phase) of the genes of the pathway, the greater the value of the vector-average magnitude.C Schematic representation of the approach we developed to remap static human cancer transcriptome (TCGA) to pseudo time-series with time-qualified points of cancer cell lines used as reference.The CCN expression pattern of TCGA samples and cell lines is also shown as an example (i.e., using an arbitrary scale) by means of heatmaps in the various pseudo-time points.D Correlation heatmap of expression profile of core-clock genes (CCN) in SW480 cells time-course and TCGA-COAD samples.Color codes are as per the legend and represent the Pearson product-moment correlation (?) of z-score values.E TIMELESS and ZEB1 expression profile in the pseudo-time analysis of TCGA-COAD samples, together with results of Phase Set Enrichment Analysis (PSEA).Each circle represents the z-scores values of TIMELESS and ZEB1 expression and are as per the legend.The inner the circle the lower the z-score.Color codes are as per the legend.The type size of enriched pathways by PSEA is proportional to the "vector-average magnitude".F Upper panel, distribution of Colon Molecular Subtypes (CMS) in the pseudo-time analysis of TCGA-COAD samples.Lower panel, distribution of Microsatellite Status in the pseudo-time analysis of TCGA-COAD samples.Und.undetermined.P-values were computed by post hoc test following chi-square; **P ≤ 0.01; ***P ≤ 0.001.
A current limit in the comprehension of molecular oncology comes from a rather static perspective of cancer molecular profiles due to individual molecular snapshots of tumor samples.Indeed, such molecular profiles do not consider the inherent timecentered variability of almost all biological processes.To circumvent such limitations, we developed a computational approach, namely Phase-Finder, which integrates the 'dynamic' dimension extrapolated from time-course analysis of cancer cell lines transcriptomes with 'static' transcriptomes of hundreds of human tumor samples.By Phase Set Enrichment Analysis (PSEA) we found distinct and suggestive time-phasing of cancer-relevant pathways, hinting time-related fluctuation of activity of genetic and epigenetic programs directing phenotypic changes and driving cell plasticity in cancer.Our findings suggested a certain temporal plasticity of cancer molecular subtypes and druggable targets in RWD of both CRC and LUAD, which definitely impact patient prognosis and could be functional to the adaptation to anti-cancer treatments.
A limit in our approach could be the low number of cell lines used for each tumor type to remap 'static' human cancer transcriptome to pseudo time-series.As a matter of fact, no additional circadian time-course series of CRC and LUAD cell lines were found in the current scientific literature.However, when we performed CCN genes expression profile analysis of the Cancer Cell Line Encyclopedia (CCLE; see methods) including CRC (N = 56) and LUAD cells lines (N = 72), we found that remapped CCN gene expression patterns of CCLE to pseudo-time series highly mirrored the CCN genes pattern found in SW480 and PC9 cell lines (Supplementary Fig. S4).Therefore, the transcriptional hierarchy of CCN genes, as observed in SW480 and PC9 cells, is conserved through these two large sets of CRC and LUAD cell lines, thus supporting reliability of our data.
The main purpose of personalized medicine as well as conventional and chrono-modulated chemotherapy in oncology practice is to include time-oriented in addition to task-oriented procedures in clinical guidelines.The International Agency for Research on Cancer (IARC) classified night shift work-related circadian disruption as a plausible (group 2 A) carcinogen for humans in 2007 [31].The characterization of time-related genomic/genetic features of the neoplastic tissue collected in a single biopsy attainable with the modeling methodology proposed in our study, may represent an effective approach to characterize the temporal structure of the tumor biology which is mandatory for the application of chronotherapy and chronomedicine in clinical settings.Importantly, on the basis of this timequalified characterization, more reliable cancer patient stratification could be implemented as well as tailored cancer therapy could be scheduled and chronomodulated by means of periodic analysis of rhythmic features, including circulating tumor cells and DNA (ctDNA) as a minimally invasive biomarker for tumor molecular profiling [32].
In conclusion, we believe that our findings will pave the way toward reliable methodologies to assess the functioning of the circadian clock circuitry for a better ITH interpretation and identification of effective therapeutic strategies.
Fig. 2 Phase Set Enrichment Analysis (PSEA) and Phase-Finder analysis of LUAD transcriptomes.A Correlation heatmap of expression profile of core-clock genes (CCN) in time-course analysis of PC9 cells and pseudo-time analysis of TCGA-LUAD samples.Color codes are as per the legend and represent the Pearson product-moment correlation (?) of z-score values.B TIMELESS and ZEB1 expression profile in the pseudo-time analysis of TCGA-LUAD samples, together with results of Phase Set Enrichment Analysis (PSEA).Each circle represents the z-score value of TIMLESS and ZEB1 expression and are as per the legend.The inner the circle the lower the z-score.Color codes are as per the legend.The type size of enriched pathways by PSEA is proportional to the "vector-average magnitude".C Distribution of Lung Cancer Molecular Subtypes (C1 and nonC1) in the pseudo-time analysis of TCGA-LUAD samples, together with the expression profile of ARNTL2 gene (secondary Y-axes; FC fold change).P-values were computed by post hoc test following chi-square; *P ≤ 0.05, **P ≤ 0.01, and ****P ≤ 0.0001.Lower panel, bubble plot showing the Pearson correlation scores (?) of CCN genes with C1 subtype distribution; color codes and size of bubbles are as per the legend.D Bubble plot analysis of ARNTL2, TIMELESS and ZEB1 expression profile in the pseudo-time series in the TCGA-LUAD dataset (upper panel) or in the TCGA-COAD (CRC) dataset (lower panel).Color codes are as per the legend.Size of bubbles are directly proportional to Log 2 Ratio of expression as per the legend.Correlation plots of the indicated genes are also shown (underneath the two panels).Dashed line indicates the two pseudo-time interval periods used to stratify TCGA samples as in E. Survival analysis of TCGA-LUAD patients (E) and TCGA-COAD (CRC) patients (F) stratified by the indicated pseudo-time interval periods.Color codes are as per the legend.P-values were computed by log-rank test.