Identification of six genes associated with COVID-19-related circadian rhythm dysfunction by integrated bioinformatic analysis

Patients with coronavirus disease 2019 (COVID-19) might cause long-term burden of insomnia, while the common pathogenic mechanisms are not elucidated. The gene expression profiles of COVID-19 patients and healthy controls were retrieved from the GEO database, while gene set related with circadian rhythm was obtained from GeneCards database. Seventy-six shared genes were screened and mainly enriched in cell cycle, cell division, and cell proliferation, and 6 hub genes were found out including CCNA2, CCNB1, CDK1, CHEK1, MKI67, and TOP2A, with positive correlation to plasma cells. In the TF-gene regulatory network, NFYA, NFIC, MEF2A, and FOXC1 showed high connectivity with hub genes. This study identified six hub genes and might provide new insights into pathogenic mechanisms and novel clinical management strategies.


Introduction
In the end of 2019, a novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused an outbreak of pneumonia and a pandemic throughout the world (Alhamid et al. 2022). On February 2020, the WHO named the disease coronavirus disease 2019 . Besides pneumonia, extrapulmonary manifestations of COVID-19 are also increasingly reported (Mousavi et al. 2022). In a report of more than 370,000 cases of COVID-19, cough (50%), fever (43%), myalgia (36%), and headache (34%) were the most common presenting symptoms (Stokes et al. 2020). Moreover, patients recovering from COVID-19 may also have psychological symptoms [e.g., anxiety, depression, posttraumatic stress disorder (PTSD)] and cognitive symptoms (e.g., memory loss and inattention). Several studies have shown that nearly half of COVID-19 survivors report decreased quality of survival, 22% have anxiety/depression, and 23% have persistent psychological symptoms at 3 months (Carfì et al. 2020;Wong et al. 2020). Other persistent symptoms include loss of smell, joint pain, headache, dry syndrome, rhinitis, taste disturbances, loss of appetite, dizziness, myalgia, and insomnia. During the COVID-19 outbreak, insomnia prevalence was estimated at 38.9% across 5 studies, which causes serious damage to public mental health (Pappa et al. 2020).
Insomnia is one of the most common complaints, with more than 5 million visits per year in the USA (Ford et al. 2014). Insomnia has a complex association with other physical and mental illnesses, and most patients have one or more risk factors or comorbidities, such as depression, anxiety, chronic obstructive pulmonary disease, asthma, and neurodegenerative diseases. The genetic basis of insomnia is not fully described, and genome-wide association studies have identified many loci that may be associated with insomnia, and also overlap with depression, anxiety, neuroticism, type 2 diabetes, coronary artery disease, and other disorders (Stein et al. 2018;Jansen et al. 2019). As circadian rhythms and sleep are fundamental biological processes integral to human health and their disruption is associated with detrimental physiological consequences, many studies have found that patients with long COVID-19 have an increased probability of developing insomnia, but the mechanism of correlation between COVID-19 and insomnia is still unclear.
In this study, we firstly tried to explain mechanism of circadian rhythm disturbance caused by COVID-19, via studying the genetic signatures and the potential regulatory targets and pathways.

Datasets preparation
GSE217498 and GSE166253 were downloaded from the Gene Expression Omnibus database (GEO, https:// www. ncbi. nlm. nih. gov/ geo/, accessed on 10 Mar. 2023), which contains high-throughput gene expression profiles submitted by research institutions. GSE217498 consisted peripheral blood samples of 203 COVID-19 patients and 71 controls, while GSE166253 consisted of 10 retestingpositive COVID-19 patients, 6 convalescent patients, and 10 healthy controls. Then, we downloaded genes associated with circadian rhythm from Genecards (https:// www. genec ards. org/, accessed on 20 Mar. 2023). The Limma R package was used for normalizing the expression matrix. Then, log2 transformation was performed, and ensemble IDs were converted into gene symbols (the mean value was calculated in the presence of duplicate expression data) using the org.Hs.eg.db package in R software.

Bioinformatic analysis
The expression profiles of GSE217498 were used to construct a co-expression network by using the package of WGCNA in R (https:// cran.r-proje ct. org/ web/ packa ges/ WGCNA/). Genes in module with highest significant correlation with COVID-19 were selected for further analysis. The shared genes between COVID-19 and circadian rhythm were identified as the common gene set (CGS). Gene ontology (GO) annotation analysis was executed via "clusterProfiler" package of R, and P value < 0.05 was the cut-off. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was executed via Kobas database (http:// kobas. cbi. pku. edu. cn/, accessed on 27 Mar. 2023). The protein-protein interaction (PPI) network of the interested module was constructed using the STRING database (https:// cn. string-db. org/, accessed on 28 Mar 2023) and presented by Cytoscape software. The cytoHub tool was used to screen the hub genes (top 10) via four topological analysis methods, including degree, maximal clique centrality (MCC), edge percolated component (EPC), and maximum neighborhood component (MNC). The intersection of the four algorithms was identified as hub genes, and then, the PPI network was used to analyze the hub genes at the protein level. Functional roles of the hub genes were retrieved through Genecards, and the package of "Corrplot" in R was used to analyze the correlation of hub genes. The expression matrix of hub genes in GSE217498 was subjected to differential expression analysis to compare COVID-19 vs. healthy controls using the DESeq2 package in R software. Furthermore, the expression matrix of GSE166253 was analyzed as validation set. NetworkAnalyst 3.0 databases (https:// www. netwo rkana lyst. ca/, accessed on 30 Mar 2023) was used to predict the TFs and miRNAs of hub genes. The immune enrichment scores of 28 immune cells were evaluated via CIBERSORT (https:// ciber sortx. stanf ord. edu, accessed on 7 Apr 2023). The expression matrix of Pearson correlation coefficients between each immune cell was visualized by "Corrplot" package. The DSigDB database (https:// dgidb. org/, accessed on 30 Mar 2023) was used to investigate drug-gene interaction to identify drugs associated with hub genes.

Results and discussion
We used WGCNA to analyze the expression profiles of GSE217498 to find co-expressed gene modules. A total of 8 modules, including cyan (1,987 genes), green yellow (516 genes), grey (532 genes), grey60 (265 genes), light cyan (8591 genes), magenta (1780 genes), red (596 genes), and tan (500 genes), were identified (Fig. S1B). Genes in gray were not included in any module, and then, we analyzed the interactive relationships underlying the 7 coexpression modules. After docking with clinical character data, it was found that there was a significant correlation between the tan module and COVID-19 (Fig. S1C).
A total of 2320 genes related to circadian rhythm were downloaded from genecards. The 76 shared genes between circadian rhythm-related genes and genes in the tan module were screened as CGS. Functional enrichment analysis demonstrated that these genes were mainly enriched in cell cycle, cell division, and cell proliferation, which is consistent with that cell division in many mammalian tissues is associated with specific times of day (Matsuo et al. 2003). Subsequently, based on four algorithms, including degree, MCC, EPC, and MNC, 6 genes (CCNA2, CCNB1, CDK1, CHEK1, MKI67 and TOP2A) were intersected as hub genes (Fig. 1). Functional roles of the six hub genes were shown in Table 1. In addition, these hub genes exhibited significant positive correlations (Fig. S2A). The heat map showed that all six hub genes were significantly upregulated in the COVID-19 group compared to non-COVID-19 group (Fig. S2B). All six genes showed significant differences between the COVID-19 group and non-COVID-19 group in training set. And differential expression of CCNB1, CDK1, CHEK1, and MKI67 was confirmed in the validation set (Fig. S3).
By using the NetworkAnalyst 3.0 databases, we found that NFYA, NFIC, MEF2A, and FOXC1 had high connectivity in the TF-gene regulatory network, regulating four hub genes simultaneously. Infiltration of 22 immune cell types was Fig. 1 A The PPI network ranked by weighted degree in CGS. Nodes were ordered and sized according to their degree and edges were sized according to their weight. B Venn diagrams showed 6 common hub genes identified by four algorithms of cytoHubba plug-in. C A physically and functionally connected PPI network through STRINGs. Nodes represented proteins, and edges represented pairs of interactions between proteins

Conclusion
Through bioinformatic means, 76 shared genes were identified and mainly enriched in cell cycle, cell division, and cell proliferation, and 6 hub genes were found out including CCNA2, CCNB1, CDK1, CHEK1, MKI67, and TOP2A, with positive correlation to plasma cells. Our findings demonstrated that these six hub genes acted important roles in both COVID-19 and circadian rhythm and might be potential biomarkers for COVID-19-related insomnia.

Limitations and future prospects
Firstly, as there are many factors that can influence sleep and this research only focused on infection status, follow-up studies should strictly classify patients in groups to exclude other interfering factors. Secondly, our study used the gene expression file in the peripheral blood mononuclear cells because collecting brain tissue from patients is invasive and risky and such a study would not be able to be approved by the ethics committee. Additional genetic and experimental studies are required to explain the mechanism and the function of these hub genes, especially MKI67 and TOP2A.