Machine Learning and In-Vivo Expression Analyses Reveal Circadian Clock Features Predictive of Anxiety in Humans


 Mood disorders, including anxiety, are associated with disruptions in circadian rhythms and are linked to polymorphisms in circadian clock genes. Molecular mechanisms underlying these connections may be direct—via transcriptional activity of clock genes on downstream mood pathways in the brain, or indirect—via clock gene influences on the phase and amplitude of circadian rhythms which, in turn, modulate physiological processes influencing mood. Employing machine learning combined with statistical approaches, we explored clock genotype combinations that predict risk for anxiety symptoms in a deeply phenotyped population. We identified multiple novel circadian genotypes predictive of anxiety, with the PER3B-AG/CRY1-CG genotype being the strongest predictor of anxiety risk in males. Molecular chronotyping, using clock gene expression oscillations, revealed that advanced circadian phase and robust circadian amplitudes are associated with high levels of anxiety symptoms. Further analyses revealed that individuals with advanced phases and pronounced circadian misalignment were at higher risk for severe anxiety symptoms. Our results support both direct and indirect influences of clock gene variants on mood: while sex-specific clock genotype combinations predictive of anxiety symptoms suggest direct effects on mood pathways, the mediation of PER3B effects on anxiety via diurnal preference measures and the association of circadian phase with anxiety symptoms provide evidence for indirect effects of the molecular clockwork on mood. Unraveling the complex molecular mechanisms underlying the links between circadian physiology and mood is essential to identifying the core clock genes to target in future functional studies, thereby advancing the development of non-invasive treatments for anxiety-related disorders.


Introduction
Mood disorders, including depression and anxiety, are becoming more prevalent globally, affecting nearly one-fth of the adult population (Steel et al., 2014). These disorders negatively impact productivity, social relationships, and overall quality of life in individuals. The search for genetic and environmental factors contributing to the epidemic of mental health has uncovered numerous links between circadian rhythm disruptions and mood disorders including major depressive disorder (MDD), schizophrenia, bipolar disorder (BD), and anxiety (Roybal et Ketchesin et al., 2020). Clinical studies have also demonstrated that circadian rhythms and circadian clock genes can modulate mood and psychiatric disorders but few of these studies have explicitly focused on anxiety (Karatsoreos, 2014;Landgraf et al., 2014).
Circadian rhythms regulate a sleep-wake cycle that is reset every 24 hours based on exposure to natural or arti cial light-dark cycles. The molecular clock driving these rhythms is created by feedback loops in core clock genes and their associated transcription factors that control physiological cycles in the body via the regulation of over a third of all transcribed genes (Evans et al., 2012a;Walker et al., 2020). The core feedback loop includes the transcription factor CLOCK, which positively regulates the transcription of genes in the Period (PER1, PER2, and PER3) and Cryptochrome (CRY1, and CRY2) complexes. The CLOCK protein forms a heterodimer with BMAL1, which activates additional core clock genes. PER/CRY heterodimers, in turn, inhibit the transcriptional activity of the BMAL1-CLOCK complex. This cycle of transcription activation and repression is vital for the 24-hour circadian cycle. Mutations in these core clock genes may affect mood via direct transcriptional activity of downstream physiological pathways or indirectly through disruptions in the phase and strength of circadian rhythms (Albrecht, 2017). Individual core clock genes may, in fact, be involved in both direct and indirect mechanisms modulating the relationship between mood and circadian rhythm.
Evidence supporting indirect impacts of the core circadian oscillator on mood pathways is derived from studies examining the association of diurnal preference, differences in timing of activity levels, or chronotype, differences in sleep-wake timing, with mood (Nguyen et  . These studies provide strong evidence that diurnal variation in circadian rhythms plays a role in modulating the physiology of mood disorders. However, attempts to determine the molecular mechanisms underlying these indirect circadian in uences on mood and anxiety, in particular, are in a nascent stage. Core circadian genes also function as transcriptional regulators and neurotransmitters and can directly in uence well-known mood pathways, including serotonin, dopamine, and glucocorticoid pathways (McClung, 2013). Recent studies demonstrate that components of the circadian clock can directly modulate mood disorders. In mice, knock down of the NPAS2, period, or cryptochrome genes results in altered anxiety levels (Evans et al., 2012;Savalli et al., 2015;Ozburn et al., 2017a). In humans, genomeand phenotype-wide association studies (GWAS and PheWAS) have not yielded strong evidence of associations between core clock genes and mood until Ho and colleagues (2018) identi ed a clockrelated gene associated with seasonal affective disorder. However, population-level candidate gene studies have identi ed multiple links between clock genes and depressive disorders, including anxiety Mathematical models have also predicted links between circadian clock disruptions and anxiety (Liberman et al., 2018). Most interestingly, a recent clinical study demonstrated that melatonin treatment used to correct circadian misalignment in anxious patients helped to mitigate anxiety (Satyanarayanan et al., 2020). The lack of clarity in large GWAS and PheWAS studies with complex phenotypes like mood disorders suggests that directed candidate gene studies using deep phenotyping are needed to identify the in uence of speci c clock genes and/or synergistic clock gene interactions on mood pathways.
In the current study, we performed a deep phenotypic analysis of anxiety (State-Trait Anxiety Inventory (STAI)) (Spielberger et al., 1983), diurnal preference (Morningness-Eveningness Questionnaire (MEQ) (Horne and Ostberg, 1976), and molecular chronotype (via in-vivo gene expression analyses based on clock gene expression phase and amplitude). We explore the synergistic effects of multiple genotypes on phenotypes using an array of machine learning algorithms, including feature selection and association rule learning, as well as statistical approaches. Unlike PheWAS studies, machine learning techniques are not constrained by and can be robust to the smaller sample sizes typical of deeply phenotyped datasets. In addition, feature selection can be used to incorporate both clinical and genotypic features to reduce the data's dimensionality and identify the most predictive disease risk factors. Here, we employ deepphenotyping and machine learning to identify direct and indirect mechanisms of circadian in uences on anxiety.

Experimental Data Collection
Study participants were recruited from Colgate University and the surrounding community in Hamilton, NY, USA (n=982; males=318, females=664, ages 17-79; median=19). Participants were predominantly Caucasians of European descent. All participants gave written informed consent, and all procedures followed the principles of the Declaration of Helsinki. The Institutional Review Board at Colgate University authorized all consent forms and procedures (#FR-F13-07, #ER-F14-12, #F15-13, and #ER-F16-19).

Self-report Surveys
Participants completed computer-based surveys, which included the trait version of the Spielberger's State-Trait Anxiety Scale (STAI), Beck Depression Inventory (BDI-II), and the short form of the Patient-Reported Outcomes Measurement Information System (PROMIS™) Sleep Disturbance. The STAI was used to indicate the anxiety scores of individuals ranging from 20-80, with scores of 20-37 indicating "no or low anxiety," 38-44 indicating "moderate anxiety," and 45-80 indicating "high anxiety." The Horne-Östberg Morningness-Eveningness Questionnaire (MEQ) survey was administered to measure diurnal preference.
Genotyping DNA was extracted from ten to twenty hair follicles from each participant. The hair samples were digested with Proteinase K at 56°C for 24-hours and puri ed using the Qiagen DNAeasy Micro Kit. Genotyping for single nucleotide polymorphisms (SNPs) was performed using a TaqMan SNP Genotyping assay (Applied Biosystems, Foster City, CA) on an ABI 3700HT real-time qPCR instrument. Participants were identi ed as homozygous or heterozygous for the major and minor alleles (Suppl. Table  1).
A fragment length analysis of the PER3 VNTR length polymorphism repeat region was conducted using PCR uorescent primers on GeneScan software with an ABI 3100 sequencer. The forward primer uorescently labeled with 6-FAM was used with the following PCR primers: forward, 5'-CAAAATTTTA analysis. RNA was extracted and puri ed from hair follicles using the RNeasy Micro puri cation kit according to the protocol provided by Qiagen. The puri ed RNA was converted to cDNA using rt-PCR (TaqMan Gold rt-PCR, ABI). Nanodrop was used to quantify the cDNA. Expression levels of clock genes PER3 and NR1D2 were measured using quantitative PCR on an ABI 7900HT instrument (Applied Biosystems). GUSB and 18S were used as control genes, and each analysis was performed in a replicate of three. Relative mRNA levels were determined using the standard curve method as described in the ABI User Bulletin #2, and then converted into z-scores per individual. A standard curve was created based on the average data points from all the subjects (Ingram et al., 2016). The trained curve was tted to the four data points of each subject using the parameter estimation method, Stochastic Ranking Evolutionary Strategy (SRES) (Liberman et al., 2017). For phase shift estimation, the training curve was obtained from known intermediate types (n=20; individuals not included in this study). The phase difference between the curve obtained from each subject's four RNA data points and the training curve gave the phase shift.
Amplitudes and phases were then converted into z-scores for statistical analysis.

Feature Generation and Selection Genotypic and Clinical Features
We used seven genotypic features: CLOCK3111 (rs1801260), CRY1 (rs228716), CRY2 (rs10838524), PER2 (rs10838524), PER3A (rs228697), PER3B (rs17031614), and PER3 VNTR (rs57875989), and four behavioral/clinical features: diurnal preference scores, age (≤22 or >22), gender, and socioeconomic status (poor, lower-middle-class, upper-middle-class, a uent). For genotypic features, individuals can be homozygous dominant, homozygous recessive, or heterozygous. To reduce multicollinearity, we performed one-hot encoding for non-binary features, removing the most frequent variant (as the baseline condition). We created 2-way combinations using the seven genotypic features and their respective variants, giving us 7C2 * 9 more features. We removed the most frequent class for each group of 9 combinations and treated it as a reference category. After pre-processing, the data set for analysis contained 174 total features.

Feature Selection
We used four feature selection methods (each method ten times with ten-fold cross-validation) to nd epistatic combinations predictive of mood disorders. InfoGain (IG) and ReliefF ( . We evaluated the performance of our classi ers using accuracy scores and the area under the receiver operating characteristic (AUROC) curves. Classi ers employ a variety of hyperparameters that must be tailored for each dataset. As a result, we used preliminary testing to determine an appropriate range of hyperparameters for each classi er, followed by grid searching to determine the optimal combination of hyperparameters for maximizing accuracy.

Cross-validation
We employed strati ed tenfold cross-validation to determine each model's generalizability. We divided the data set into ten folds(subsets) for each combination of feature selection method and classi er, maintaining a consistent distribution of our outcome class for each fold. We performed the k-nearest neighbors' imputation to ll in missing values for each fold (Fukunaga, 1981). We repeated the crossvalidation procedure ten times to ensure robust results, each time using a different random number generator seed.

SMOTE
When a dataset is unbalanced, the feature selection and classi cation models frequently overestimate the likelihood of the majority outcome. As a result, the model may be inaccurate. We employed Synthetic Minority Oversampling (SMOTE) technique to increase model accuracy by balancing our unbalanced dataset. SMOTE accomplishes this by identifying the k-nearest neighbors (we chose k = 5 based on empirical evidence) and randomly generating new data along the line connecting two neighbors of the same class (Chawla et al., 2002). To ensure our dataset was balanced, we used SMOTE to oversample the number of cases for the less frequent outcome. We performed our analysis with and without SMOTE to determine whether it improved them and then reported the balanced dataset results.

Statistical Analyses
Logistic Regression All statistical analyses were performed using R (R Core Team, 2020). We performed logistic regression analysis on each feature individually, keeping one-hot encoded features grouped together for the regression. After this univariate analysis, we performed multivariate logistic regression on all the features. Due to the high dimensionality of the dataset, we observed over tting of the initial model. As a result, we performed multivariate logistic regressions using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), employing a sequential replacement method to identify subsets of features with low multicollinearity and strong association with the target variable (Bozdogan, 1987). We conducted these analyses using the RcmdrMisc library in R (RcmdrMisc). Along with AIC and BIC, we used the results of machine learning based feature selection algorithms to identify robust features for subsequent multivariate analysis as described above. Due to unplanned pairwise comparisons between features, pvalues from the regression analysis were adjusted using the Benjamini-Hochberg method (Benjamini & Hochberg, 1995).

Mediation Analyses
We used mediation analysis to determine whether MEQ scores were statistically signi cant mediators between genotypic and clinical factors and mood disorders. The mediation's signi cance was found using the R package mediate, which employs a nonparametric bootstrapping method to compute a con dence interval for the mediatory effects (Imai et al., 2010).

Fisher's Exact Tests
Fisher's exact tests were used to identify features with a strong association with human anxiety individually. We created heatmaps illustrating patterns of association between molecular chronotype and human anxiety using different phase, amplitude, and STAI cutoffs.

Analysis of Variance
To identify sex-speci c differences in the average STAI scores for different two-way gene combinations, we used the car library in R to conduct a Type-3 Sum of Squares two-way ANOVA (Fox & Weisberg, 2019). Tukey's follow-up tests on signi cant factors were performed using the emmeans library. The normality of data was assessed by visual inspection and Shapiro-Wilk's test in R.

Association Rule Learning Analyses
We performed association analysis using the arules package (Hahsler et al., 2011) The probability that an association occurs in the dataset is called its support. The lift of a rule is de ned as the ratio of the observed support to that expected if the left-hand side and right-hand side of the relationship were independent. Since we had eleven variables, it was computationally intractable to nd rules of length up to eleven with suitable support and lift. We limited ourselves to rules of size at most six, with at least 90% con dence. For each sex, we found rules that code for both categories of the target variable and then sorted them by their respective lift values. We visualized these relationships in sex-speci c network plots, using the igraph library (Csardi & Nepusz, 2006).

Gene Networks Using Mutual Information
We employed the Algorithm for the Reconstruction of Gene Regulatory Networks (ARACNE) to nd the direct and indirect interactions between genes, clinical features, MEQ, and mood disorders (Margolin et al., 2006). ARACNE constructs a network of relationships between nodes using a distance metric such as mutual information and correlation; for each triplet of edges, drops the edge with the lowest value. We used the minet package in R to create the network and then plotted it using Rgraphviz (Meyer et al., 2008). We used bootstrapping to determine the frequency (i.e., con dence level) with which each link in the network appears.

Results
Synergistic, two-way genotype combinations are predictive of human anxiety RF, SVM, and XGB classi ers predicted anxiety symptoms with an accuracy of 61%-76% using all or a subset of genotypic and clinical factors chosen by feature selection methods ( Figure 1). The XGB method achieves the highest accuracy (76%) when all features are used. However, if sixty features selected using the JMI method are used, a similar accuracy level (75%) can also be obtained using XGB and RF classi ers. These accuracy levels are 25%-26% more accurate than random chance in our balanced dataset, which has a baseline accuracy of 50%.
Multivariate logistic regression analysis of the top features revealed that two-genotype combinations predicted a more robust risk of anxiety symptoms relative to single gene variants (Table 1). In the overall dataset, the combination of PER3B-AG and CRY1-CG was most strongly associated with the risk of having anxiety symptoms (  The combination of CLOCK3111-TC and CRY2-AG also signi cantly increased the odds of anxiety symptoms ( Additionally, the combination of PER3B-AG and CRY2-AA was protective against anxiety ( The association rule learning results identi ed additional multi-way genotype and clinical feature combinations that were strong predictors of human anxiety (Suppl. Table 2); risk genotypes differed for males and females ( Figure 3). For females, the most frequently appearing SNP variants in the top predictors of anxiety were CLOCK3111-TC and PER3B-GG, with age and MEQ as important clinical factors (Figure 3a). For males, the most frequently occurring variants were PER2-GG, PER3B-GG, CRY2-GG, and CLOCK3111-TC, with age, but not MEQ, signi cantly predicting the risk of anxiety (Figure 3b).
Genotypic associations with anxiety symptoms can be direct or mediated by diurnal preference Our network analysis using ARACNE on the interactions between genotypes, clinical features, and anxiety symptoms show that only the PER3B SNP variant (rs17031614) shares mutual information with anxiety symptoms via diurnal preference scores, which acts as a mediator (Suppl. Figure 1). All other variants (PER2, PER3 VNTR, PER3A, CLOCK3111, CRY1, and CRY2 SNPs) were directly associated with anxiety symptoms following bootstrap analysis. The most robust links to anxiety symptoms are diurnal preference scores and depressive symptoms; the latter is likely due to the well-known co-morbidity of anxiety and depressive symptoms.
To further investigate the effects of MEQ on anxiety symptoms, we performed a mediation analysis of the top features. We found that the association between being a college-aged student and anxiety scores was mediated by diurnal preference scores. We also found that PER2-GG was strongly associated with an increase in anxiety scores (coe cient = 2.17, t 418 = 2.09, p = 0.038) and the combination of PER3B-AG and CRY2-AG was weakly associated with anxiety scores (coe cient = 4.44, t 418 = 1.85, p = 0.065); interestingly, these effects were completely mediated by diurnal preference score.
Circadian phase, amplitude, and misalignment are associated with anxiety Using Fisher's exact tests to test for signi cant associations across circadian phase and anxiety symptom scores, we found that advanced circadian phase values measured using a PER3 or NR1D2 gene markers (>2.3 and > 1.7 standard deviations above the mean, respectively) were strongly associated with high anxiety scores. Similarly, high circadian amplitudes (>1.7 and >2.2 standard deviations above the mean, respectively) were also strongly associated with high anxiety scores. The detailed patterns of association are shown via heatmaps (Figure 4).
Using in-vivo gene expression data to measure the degree of mismatch in phase and amplitude with selfreported chronotype, we estimated the risk of anxiety symptoms with circadian misalignment. We found that individuals with advanced circadian phase and evening preferences (low MEQ scores) are six times more likely to report anxiety (PER3: OR = 5.88 (1.22 -28.40), p = 0.027; Nr1d2: OR = 6.50 (0.77 -58.48), p = 0.085).

Discussion
A growing body of evidence indicates that alterations in circadian rhythms and clock gene mutations in uence mood disorders, including anxiety. The search for molecular mechanisms underlying these connections has focused on GWAS and PheWAS studies. Still, these efforts are limited by the di culty of predicting complex disorders with weak phenotyping and the inability to detect synergistic effects among genotypes. Further, identifying genotypes signi cantly associated with anxiety provides insu cient information to discern whether gene variants in uence symptoms directly or indirectly, impeding the utility of this information for therapeutic interventions. This study uses a novel approach based on machine learning and statistical analysis to explore associations between circadian genes and anxiety symptoms in a deeply-phenotyped population sample. We report three ndings: 1) synergistic interactions between variants in PER3B, CLOCK3111, and cryptochrome genes, CRY1 and CRY2, show robust associations with anxiety symptoms, 2) clock variants predictive of anxiety symptoms tend to have sex-speci c effects, and 3) molecular chronotype (circadian phase) and circadian misalignmentparticularly individuals with advanced phases and evening-type sleep-wake cycles-are strong predictors of anxiety symptoms. Our results suggest that circadian clock gene variants have both direct (sexspeci c) and indirect (clock-mediated) effects on anxiety symptoms.

Circadian genotypes most predictive of anxiety
Our results con rm previous associations of anxiety with gender and chronotype and reveal novel associations of anxiety symptoms with circadian genotypes. We found that the combination of PER3B-AG and CRY1-CG was most strongly associated with the risk of having anxiety symptoms. The PER3 gene encodes the period circadian protein homolog 3 protein in humans and is a paralog to the PER1 and PER2 genes. PER3 is not essential for maintaining the circadian rhythm but plays a vital role in sleepwake timing and sleep homeostasis. This gene is upregulated by CLOCK/BMAL heterodimers but then is By what mechanisms do variants in PER3/CRY genotype combinations in uence mood? PER3 is upregulated by CLOCK/BMAL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/BMAL. Variants in PER3 and CRY genes may affect the dimerization of the proteins and the speed or success of the binding interaction with the CLOCK/BMAL complex, thus altering the sleep-wake cycle and in uencing the timing of molecular pathways regulating mood. Alternatively, changes in PER3/CRY binding could affect the regulation of downstream pathways that directly in uence mood pathways. This highlights a potentially critical role of CRY protein binding in modulating mood pathways.
Combinations of CLOCK3111/CRY2 variants may act similarly to the PER3B/CRY1 complexes, inhibiting the phosphorylation of the BMAL1/CLOCK dimer. In humans, CLOCK variants have been identi ed as important for both seasonal affective disorder (SAD) and bipolar disorder (Benedetti et al., 2003; Kim et al., 2015). The CLOCK3111 SNP (rs1801260) is also linked to diurnal preference; individuals homozygous for the C allele have a stronger evening preference (Katzenberg et al., 1998;Mishima et al., 2005). In nonhuman models, Roybal et al. (2007) show that ClockΔ19 mice exhibit hyperactivity, reduced anxiety-and depressive-related behavior, exhibiting a strong manic-like phenotype during the daytime. Transfections of mice with the CLOCK3111 variants revealed that the C-allele results in increased mRNA levels and stability (Ozburn et al., 2016). In humans, Lavebratt et al. (2010b) observed a signi cant association between the CRY2 haplotypes with winter depression in Northern European populations. Furthermore, depressed bipolar patients have reduced levels of CRY2. Results from these previous studies indicate that both CLOCK and CRY2 are associated with diurnal preference and can be a risk factor for depression and/or anxiety, suggesting that variants in these genes on mood may be direct or indirect effects. Our results suggest that the CLOCK3111 variant may directly affect anxiety symptoms, particularly in males, when found in combinations with the CRY2-AG variant.

Epistatic Effects
Complex traits with a polygenic basis, such as anxiety, are more likely to develop if the additive effects exceed a critical threshold that disrupts circadian rhythms (Albrecht, 2017;Partonen, 2012). In the current study, individuals who carry multiple circadian SNPs have an increased risk of anxiety symptoms relative to individuals who carry a single gene variant. Using feature selection methods, we can identify many two-genotype combinations that provide more signi cant effects on anxiety symptoms relative to the impacts of single SNP variants. Our association analysis allows for a more expansive exploration of multiple genes combinations. In these analyses, the top twelve rules with the strongest effects on anxiety symptoms included clinical features and two, three, and four-way gene combinations as strong predictors of human anxiety; only one factor represented a single gene (the PER3 VNTR_5,5 genotype; Suppl. Table  I). The most frequently appearing SNP genotype in the association analysis was CLOCK3111-TC. Two of the rules with the highest lift values show the combination of CLOCK3111-TC, CRY2-AG, and PER2-AG as a signi cant predictor of anxiety. The combination of the rst two SNP variants was identi ed as statistically signi cant by feature selection and logistic regression as well, but the additional additive effects of PER2-AG were identi ed by association rule learning. These ndings imply that synergistic effects in the molecular clock are critical for modulating physiological pathways associated with anxiety.
Potential Direct Effects on Anxiety: Sex-speci c circadian genotype effects One clue to which pathways are in uenced by disruptions in the function of PER3B/CRY and CLOCK3111/CRY complexes is the fact that the effects of these variants on anxiety symptoms may be sex-speci c (Shi et al., 2016;Viena et al., 2016). Previously, Shi et al. (2016) identi ed signi cant sexdependent associations between major depressive disorder (MDD) and common variants of the circadian clock genes CLOCK, PER3, and NPAS2. In that study, the association of CLOCK with MDD is also stronger in males, but the association of PER3 and NPAS2 with MDD is more signi cant in females. They propose that these SNPs have a functional effect via output transcriptional pathways that are mediated sexdependently by the circadian system rather than the core clock oscillator. (Shi et al., 2016). One possible mechanism is glucocorticoid regulation (Shi et al., 2016), given that males and females have different cortisol levels (Halbreich et al, 1993). Sex-speci c, glucocorticoid-mediated stress responses may represent a mechanism by which clock genes affect anxiety and other mood disorders (Landgraf et al., 2014.). Other targets of CLOCK-mediated transcription involve neuropeptides and neurotransmitters, as well as their receptors, that may act to modulate mood pathways, including serotonergic pathways.
Interestingly, our network analyses also showed clear differences in key genotypic risk factors for males and females. The top rules for females included strong predictions for anxiety with CLOCK3111-TC, PER3B-GG, age, and MEQ, as well as weaker associations for other genotypes. Average lifts were higher for all of the top rules for males, but co-occurrence was more widely distributed across the genotypes, suggesting stronger associations between multiple gene variants and anxiety. In males, age, PER2-GG, PER3B-GG, and CRY2-GG co-occurred most frequently, but combinations with CLOCK3111TC had the highest average lift. Overall, our results suggest that the sex-speci c anxiety risk conferred by the genotypic combinations involving PER3B, CLOCK3111, and CRY2 genes may be further evidence of direct effects of clock gene binding complexes on downstream mood-related physiological pathways.
Potential Indirect Effects on Anxiety: Mediation by diurnal preference and circadian misalignment Following bootstrap analysis of the ARACNE gene network, PER3B was the only polymorphism in the current study with effects on anxiety that were signi cantly mediated by diurnal preference. In our targeted mediation analyses, associations of anxiety symptoms in PER2 homozygotes for the G-allele were also signi cantly mediated by diurnal preference. This suggests that the Period family of genes may mediate mood via indirect pathways associated with circadian phenotypes and/or circadian misalignment.
Our molecular chronotyping results provide the strongest support for the indirect role of circadian clocks on mood by linking advanced phase, robust rhythms, and circadian misalignment to high levels of anxiety symptoms. Previous studies have shown that advanced PER3 phase is strongly associated with morning types, while delayed phases values are more commonly found in evening types. In the current study, we show that higher PER3 phase values were strongly associated with high human anxiety scores. Similarly, a higher PER3 amplitude was also strongly associated with high anxiety. One potential explanation for these results is that individuals with stronger, more robust circadian amplitudes tend to have more robust sleep/wake patterns. Given that a large proportion of our study population consisted of undergraduates, the altered (i.e., typically delayed) sleep/wake patterns of college life may cause signi cant circadian misalignment, leading to high anxiety. To test whether the pattern of advanced phase and robust rhythms with high anxiety indicated the effects of circadian misalignment on mood, we identi ed individuals in our dataset with advanced circadian phases but who also reported evening-type patterns of sleep-wake behavior. The risk of anxiety symptoms was six times higher for these misaligned individuals, regardless of genotype, indicating that chronic disruptions to endogenous sleep-wake patterns increase anxiety symptoms in humans. A similar result was found for depressive symptoms in the same population (Nyugen et al 2019), suggesting that shifts in circadian circuitry may in uence parallel pathways affecting both depressive and anxious symptoms.

Limitations
This study should be viewed in the context of several limitations. Our machine learning and statistical analysis examined a large number of features for the relatively small sample size of the population. In addition, our population of primarily Caucasians of European descent limits the generalizability of our ndings to diverse populations. We demonstrated that we could accurately predict anxiety using classi cation with a subset of features selected via feature selection. However, we are unable to quantify the classi cation prediction accuracy using only the top robust features since we used the entire data set (due to the small sample size) to attain these features. Future studies should assess the accuracy of our anxiety risk factor predictions using an independent population sample. Finally, our statistical analyses report signi cant differences in anxiety risk associated with particular genotypes and circadian phenotypes; further functional and behavioral studies are needed to understand how therapeutic targets or behavioral interventions might be designed to mitigate anxiety symptoms in humans.

Conclusion
Here, we report both direct and indirect, via mediation by circadian phenotypes, effects of circadian genotypic and clinical features on anxiety symptoms. Using an approach that employs machine learning and statistical analyses to examine associations of circadian clock genes with human anxiety, our results support three conclusions. First, variants in select circadian clock genes have synergistic associations with anxiety symptoms. Second, sex-linked associations between clock gene variants and anxiety symptoms provide evidence of multiple direct pathways for clock genes to in uence mood. Finally, molecular chronotype and circadian misalignment are strong predictors of anxiety symptoms, indicating that indirect effects of clock gene variants, particularly in the PER3 gene, may also play a role in modulating anxiety symptoms. Disentangling the complex in uences of clock genes on anxiety may reveal both clinical targets and non-invasive therapies that can help mitigate the causes and symptoms of anxiety. Figure 1 Heat Map of Prediction Accuracy for Feature Selection and Classi er Methods. Our analyses yielded up to 26% higher prediction accuracy than baseline (50%) on a balanced data set.

Figure 2
Genotype combinations predictive of anxiety symptoms in males. A&B) Average anxiety scores for males with a combination of PER3B-AG and CRY1-CG are higher than average anxiety scores of individuals with other genotype combinations (Gender: F 1, 479 = 0.661, p = 0.417, Genotype: F 1, 479 = 7.174, p = 0.008; Gender x Genotype: F 1, 479 = 4.141, p = 0.042). Anxiety scores are measured using the self-reported State-Trait Anxiety Index (± 1 SE). Tukey's posthoc tests showed that males with AC-CG combination were signi cantly different from males with other genotypes and from females. B&C) Average anxiety scores of males with a combination of CLOCK3111-TC and CRY2-AG also tend to be higher than average anxiety scores of individuals with other genotype combinations, but this is not signi cant at p<0.05. (Gender: F 1, 517 = 1.773; p=0.184 , Genotype:F 1, 517 = 3.417; p=0.065; Gender x Genotype: F 1, 517 = 1.912, p = 0.167).

Figure 3
Association rules networks for anxiety symptoms. A) In females, diurnal preference (MEQ), age, and PER3BGG co-occurred most frequently and had the highest average lift in the analysis. B) In males, age, PER2-GG, and PER3B-GG co-occurred most frequently, but combinations with CLOCK3111TC had the highest average lift.

Figure 4
Heat maps of anxiety scores and circadian phenotypes. A&B) Higher anxiety scores indicative of severe anxiety are strongly associated with positive PER3 phase (advanced circadian phase or morning-types) and high circadian amplitude. C&D) Similar patterns are seen with circadian phenotypes measured using