Transcriptomic Characteristics of Suboptimal Health Status: Potential Utility of Novel Biomarkers for Predictive, Preventive, and Personalized Medicine Strategy

The early diagnosis of Suboptimal Health Status (SHS) creates a window opportunity for the predictive, preventive, and personalized medicine (PPPM) of chronic diseases. Previous studies have observed the alterations in several mRNA levels in SHS individuals. As a promising “omics” technology offering comprehension of genome structure and function at RNA level, transcriptome proling can provide innovative molecular biomarkers for the predictive diagnosis and targeted prevention of SHS. To explore the potential diagnostic biomarkers, biological functions, and signaling pathways involved in SHS, an RNA sequencing (RNA-Seq)-based analysis was on from participants with SHS and 30 age- and sex-matched healthy controls. and play the of SHS. These ndings the potential of SHS-related for targeted and personalized of We recommend strengthening the studies of signaling pathways in SHS with different omics strategies. Ten genes, including GJA1, TWIST2, KRT1, TUBB3, AMHR2, BMP10, MT3, BMPER, NTM and TMEM98, hub in the underlying mechanisms of SHS. also it the functions and activities of these differentially expressed genes the underly of SHS.


Introduction
Predictive, preventive, and personalized medicine (PPPM) is a holistic strategy in healthcare that aims to predict individual predisposition, to provide targeted prevention, and to create personalized treatment [1]. Chronic diseases are usually treated after disease onset, which is a very much delayed approach from the perspective of PPPM [2]. Suboptimal health status (SHS) is an intermediate physical state between ideal health and disease, which ts within the paradigm of PPPM [3]. Several studies have suggested that SHS might precede the occurrence of chronic diseases, including type 2 diabetes mellitus (T2DM) [4,5] and cardiovascular diseases (CVD) [6,7]. As a subclinical stage of chronic disease, the diagnosis of SHS plays a signi cant role in the targeted prevention and personalized treatment of chronic diseases from the perspective of PPPM [8].
Suboptimal health status questionnaire-25 (SHSQ-25), the most widely used screening tool for SHS, has been developed based on the perceived health complaints and physical symptoms affected by SHS [9]. The SHSQ-25 includes 25 items constituting ve dimensions: the immune system, the cardiovascular system, the respiratory system, fatigue, and mental status [9], and it has been used in Caucasians [10], Africans [4,11], and Asian [6,12]. To screen objectively diagnostic biomarkers for SHS, several biomarkers, including cortisol [13], relative telomere length [14], intestinal microbiota [15], and metabolites [16], have been investigated. Although previous studies have indicated that multi-omics biomarkers have the potential to diagnose SHS individuals, the underlying mechanisms of SHS remain partially understood.
The transcriptome is a promising "omics" technology involving the identi cation and quanti cation of the complete set of transcripts in a speci c tissue or cell type. It contains the full information about all RNA transcribed by the genome at a particular developmental stage, and under a certain physiological or pathological condition [17]. Transcriptome analysis not only provides us a better understanding of the human genome at the transcription level, but also gives a comprehensive perspective of genome structure and function [18]. Moreover, it may reveal the key alterations of biological processes that respond to pathogens, diseases, and environmental challenges, thus offering novel molecular biomarkers useful not only for the comprehension of their underlying mechanisms but also for their predictive diagnosis and targeted prevention [19].
Our previous study observed the association between chronic psychosocial stress and SHS, as well as found that decreased mRNA expression of glucocorticoid receptor α is associated with a high level of SHS [20]. The adrenaline and cortisol could impact glucocorticoid receptor expression and splicing [13,20]. In addition, changes in the transcriptome of circulating immune cells were also observed in patients who suffered from chronic fatigue syndrome, a disease resembling SHS [21]. The biological pathways, including mitochondrial function, oxidative stress, and chronic in ammation are involved in the pathophysiologic mechanism of chronic fatigue syndrome [21]. Tomas-Roig et al. found that changes in mRNA expression, such as mRNA level of dopamine receptor, are associated with long-term psychosocial stress [22]. The association between chronic psychosocial stress and SHS, together with the alterations of mRNA expression in individuals with chronic psychosocial stress, leads to the hypothesis of this study that the alteration of transcriptome pro ling might occur in SHS participants and changes in gene expression might be involved in the underlying mechanisms of SHS.
This study aimed to rstly describe comprehensive transcriptomic biosignature for SHS, and screen objectively diagnostic biomarkers for SHS using RNA-Seq-based transcriptome pro ling. In addition, the Gene Ontology (GO) annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and protein-protein interaction (PPI) network analysis of the potential differentially expressed genes (DEGs) were also conducted, so that further understanding of the biological processes involved in SHS and SHS-related chronic diseases was obtained, which proves to be useful for the PPPM of chronic diseases.

Study design and participants
From September 2017 to November 2017, a case-control study was conducted among a Chinese Han population who received annual health examination at the Student Healthcare Centre of Weifang University. In order to minimize the in uence of confounding factors, all the participants were undergraduate students aged 18 to 20 who were living in the university campus apartments. The inclusion and exclusion criteria were described previously in our study [16]. All the participants were required to complete SHSQ-25 in September 2017. After follow-up for 3 months, we assessed their SHS scores for the second time. Participants with SHSQ-25 score ≥ 35 in both two surveys were selected as cases. Then, age-and sex-matched healthy participants with SHSQ-25 score < 35 in both two surveys were selected as controls. In total, 30 SHS participants and 30 age-and sex-matched controls were included in the current study.
This study was approved by the Ethics Committee of the Weifang University, Weifang, China. Written informed consent was obtained from each participant at the beginning of the study. The ethics approval was given in compliance with the Declaration of Helsinki.

Data collection
Demographic characteristics and lifestyle information of participants, including age, sex, ethnicity, smoking, drinking dietary, and sleep duration were collected by questionnaires. The anthropometric measurements, routine biochemical tests, and clinical characteristics were measured as described in our previous study [16]. The physical activity levels, insomnia, anxiety, and depression were measured using the International Physical Activity Questionnaire [23], the Athens insomnia scale [24] the Hamilton anxiety rating scale [25], and the Hamilton depression rating scale [26], respectively.
Blood sample collection and RNA extraction Two tubes of blood samples (5 ml) were collected from each participant by venipuncture in the morning after an overnight fasting. One tube of blood sample was collected using a vacuum tube containing polymeric gel to acquire serum, which was used for routine biochemical tests. The other blood sample was collected for RNA extraction. Total RNA was extracted from buffy coat samples of 30 SHS individuals and 30 healthy controls using TRIzol reagent (ThermoFisher Scienti c, Waltham, USA). The concentration and purity of extracted RNA were measured using Nanodrop 2000 (ThermoFisher Scienti c, Waltham, USA). The quality of RNA was assessed using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). Then, RNA samples were stored at -80°C until RNA-Seq library preparation.

RNA-Seq library preparation and sequencing
Total RNA samples were quanti ed using Qubit 2.0 Fluorometer (ThermoFisher Scienti c, Waltham, USA). Then, 100 ng of RNA from each sample was used for library preparation using NEBNext Ultra™ RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, USA) according to the manufacturer's protocols. Brie y, the mRNA was puri ed using magnetic beads, and isolated mRNA was reversely transcribed to double-stranded complementary DNA (cDNA). Then, the cDNA libraries were denatured as single-stranded DNA molecules, captured on Illumina ow cells, ampli ed as clusters, and nally sequenced using an Illumina HiSeq Xten Sequencer (Illumina, San Diego, USA) for 150 bp paired-end reads.

Statistical analysis
Raw FASTQ les were ltered to remove the adaptors, polyN reads, and low-quality bases using FastQC software. Clean reads were mapped and aligned to the human genome using HISAT2 software, and mapped reads were counted using featureCounts software [27]. Normality distribution of all variables was tested by the Shapiro-Wilk test. Normally distributed continuous variables were reported as mean ± standard deviation (SD), and non-normally distributed continuous variables were represented as medians and interquartile ranges (IQR). Categorical variables were represented as frequencies and percentages. The differences in categorical variables between the two groups were tested by the Chi-square test or Fisher's exact test. The differences in continuous variables between the two groups were tested by the Student t-test or Mann-Whitney U test. To identify DEGs related to SHS, mRNA expression pro le data in read counts were analysed using the "DESeq2" package in R [28]. The Benjamini-Hochberg method was used to adjust the false discovery rate (FDR). In order to identify the biological functions and pathways in which DEGs were enriched, GO and KEGG enrichment analyses were performed using the "clusterPro ler" package in R [29]. To further explore the interaction among the DEGs, a PPI network of DEGs was constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database [30]. The top 10 genes ranked by degree were selected as hub genes using the cytoHubba application [31], while the Cytoscape software 3.8.1 (National Institute of General Medical Sciences, Bethesda, USA) was used to visualize the PPI network. Multivariate binary logistic regression was used to construct transcriptome diagnosis models for SHS. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to assess the diagnostic performance of the models.
Data analysis was performed using SPSS 25.0 (IBM Corporation, New York, USA) and R 4.0.2 [32]. All reported P values were two-tailed, and P < 0.05 was considered statistically signi cant.

Characteristics of participants
The average ages of SHS and control groups were 19.03 and 18.87 years, respectively. Higher SHS, insomnia, anxiety, and depression scores were observed in the SHS group (All P < 0.05). Aside from these, there were no statistically signi cant differences observed between the SHS and control groups in terms of the other variables (All P > 0.05). The characteristics of participants were summarized in Supplementary Table 1. Identi cation of potential transcriptomic biomarkers A total of 46 DEGs were identi ed between SHS and control group, including 22 upregulated genes and 24 downregulated genes in SHS group, based on the criteria of P value < 0.05 and the fold change > 2 (Fig. 1) (Fig. 2A). Regarding biological process (BP), the upregulated DEGs were enriched in adult heart development, organophosphate ester transport, male gonad development, development of primary male sexual characteristics, male sex differentiation, gonad development, protein heterotetramerization, development of primary sexual characteristics, regulation of cardiac muscle cell proliferation, and cardiac muscle cell proliferation (Fig. 2B). Under molecular function (MF), the upregulated DEGs were enriched in ATPase-coupled transmembrane transporter activity, primary active transmembrane transporter activity, lipid transporter activity, and growth factor activity (Fig. 2C). For the downregulated DEGs, signi cant enrichment was observed in anchored component of membrane and vacuolar lumen regarding CC (Fig. 2D). In terms of BP, the downregulated DEGs were enriched in negative regulation of bone morphogenetic proteins (BMPs) signalling pathway, positive regulation of extracellular signal-regulated kinase-1 (ERK1) and ERK2 cascade, negative regulation of ossi cation, gliogenesis, regulation of cellular response to growth factor stimulus, regulation of ERK1 and ERK2 cascade, regulation of BMP signalling pathway, ERK1 and ERK2 cascade, cellular transition metal ion homeostasis, and regulation of gliogenesis (Fig. 2E). For MF, the downregulated DEGs were enriched in protein serine/threonine kinase activity (Fig. 2F).

KEGG enrichment analysis
To explore the biological pathways related to SHS, KEGG enrichment analysis of DEGs was performed. As shown in Fig. 3A, the upregulated DEGs were signi cant enrichment in ATP-binding cassette (ABC) transporters. In addition, KEGG enrichment analysis identi ed three signi cant pathways of the downregulated DEGs, including proteoglycans in cancer, Parkinson disease, and neurodegeneration (Fig. 3B).

PPI network analysis and hub gene recognition
In order to provide a comprehensive view of potential functional relationships of DEGs, the 22 upregulated genes and 24 downregulated genes were mapped by the STRING database to establish a PPI network (Fig. 4). Protein pairs with a combination score > 0.15 were selected in the PPI network. The top 10 genes with the highest degree of connectivity were de ned as critical hub genes (Fig. 5), including gap junction protein α1 (GJA1), twist family basic helix-loop-helix transcription factor 2 (TWIST2), keratin 1 (KRT1), tubulin β3 (TUBB3), anti-mullerian hormone receptor type 2 (AMHR2), bone morphogenetic protein 10 (BMP10), metallothionein 3 (MT3), bone morphogenetic protein binding endothelial regulator (BMPER), neurotrimin (NTM), and transmembrane protein 98 (TMEM98).

Discussion
As a reversible stage in advance of chronic disease, SHS proposes a new effective conception for population risk strati cation under the perspective of PPPM. In addition, identifying key biological pathways relevant to the progression of SHS towards chronic diseases is considered as a novel and viable strategy for predictive diagnosis, targeted prevention and personalized therapy of chronic disease. With the emergence of RNA-Seq-based technologies, transcriptome pro ling plays a signi cant role in deciphering gene expressions on RNA level and identifying molecular biomarkers. In the present study, 46 genes are differentially expressed between individuals with SHS and individuals with ideal health status. GO annotations and KEGG pathway enrichment analysis also revealed that several biological processes, such as ABC transporter and neurodegeneration, were related to SHS, and 10 hub genes with the highest degree of connectivity were identi ed. In addition, the AUC of the predictive diagnostic model based on transcriptomic biomarkers was 0.938 (95% CI: 0.882-0.994). These ndings suggest that blood transcripts are potentially objective biomarkers for the SHS diagnosis. These transcriptomic biomarkers provide better insight into the critical genes associated with SHS and a deeper understanding of its biological processes.
In the present study, signi cantly lower level of GCRK mRNA was found in individuals with SHS. Glucokinase regulator protein, encoded by glucokinase regulator (GCKR) gene, is a hepatocyte-speci c regulatory protein that inhibits glucokinase in liver cells [33]. Glucokinase, a hexokinase isozyme, is a key regulator of glucose disposal and storage, and responds to increases in circulating glucose concentration by initiating a signalling cascade that results in insulin secretion from pancreatic islets β cell [34]. Alterations in glucokinase expression and activity are associated with poorly controlled T2DM [35] and nonalcoholic fatty liver disease (NAFLD) [36]. It has been reported that common variants in the GCKR gene are associated with increased blood triglycerides [37,38], lower fasting glucose [38], and NAFLD [39]. The glucokinase regulator protein, which binds with glucokinase and inactivates it from carbohydrate metabolism, is able to serve as a new treatment target for T2DM [40]. Signi cantly lower levels of GCKR mRNA in the SHS individuals in this study indicate that disorders of glucose metabolism might play an important role in the pathophysiology of SHS. Given the ndings, GCKR mRNA might be a potential predictive diagnostic biomarker for the progression of SHS towards T2DM, and glucokinase regulator protein could be applied as potential therapeutic or preventive targets for SHS and T2DM.
Functional annotation and pathway enrichment analysis of DEGs provided an intuitive overview of the mechanism of SHS. The signi cant GO terms were BMPs signalling pathway and regulation of ERK1/2 cascade. BMPs, a group of signalling molecules, are part of the transforming growth factor-β superfamily of proteins. Initially discovered for their ability to induce bone formation [41], BMPs are now known to play important roles in the adult vascular endothelium, promoting angiogenesis and mediating oxidative stress [42]. Due to the critical roles of BMPs in maintenance of adult tissue homeostasis, it is found that dysregulation in BMPs signalling pathway contribute to various diseases, including cancer, skeletal disorders and CVD [43]. Our previous studies suggested that SHS might precede the occurrence of CVD [6,7]. In the present study, signi cantly lower levels of BMPER and hemojuvelin BMP co-receptor (HJV) mRNA, which involved in the BMPs signalling pathway, were found in individuals with SHS. Our ndings indicate that BMPs signalling pathway may play an important role in the pathophysiology of SHS, and transcripts of BMPER and HJV could be potential diagnosis biomarkers for SHS. ERK1 and ERK2 cascade is key signalling pathway that regulates a large variety of cellular processes, including adhesion, migration, differentiation, metabolism, and proliferation [44]. This signalling cascade is dysregulated in a variety of diseases including CVD [45], insulin resistance [46], and in ammation [47].
In current study signi cantly lower levels of metallothionein 3 (MT3) and C-C motif chemokine ligand 3 (CCL3) mRNA in SHS individuals indicates that the ERK1 and ERK2 cascade is associated with the progression of SHS towards chronic disease, such as CVD and T2DM.
The KEGG enrichment analysis revealed that ABC transporter and neurodegeneration are the biological pathways related to SHS. ABC transporters are a large family of transmembrane proteins. These proteins bind ATP and use the energy to drive the transport of various molecules across cell membranes [48]. In human, the 48 ABC proteins are divided into seven subfamilies, from A to G, based on sequence and organization of their ATP-binding domain [48]. The ABCA4 protein transports vitamin A derivatives and perform a crucial role in the visual cycle [49]. In the present study, the downregulation of ABCA4 was observed in the individuals with SHS, which indicates that the decreased level of ABCA4 mRNA is associated with the SHS phenotype of eye, such as eye ache and fatigue.
The ABCG8 protein functions to facilitate the transport of sterols in the intestine and liver [50]. Our previous study found that steroid hormone biosynthesis pathway is disturbed in SHS individuals [16], which indicates that the upregulation of ABCG8 might be associated with the disorder of steroid hormone biosynthesis in SHS individuals. Chen and colleagues have observed that trimethylamine-N-oxide, a metabolite produced by gut microbiota, is associated with increased ABCG8 expression [51]. In addition, Zhu et al. has proved that intestinal microbiota, Enterococcus faecalis, increase the expression of ABCG8 [52]. Our previous study has found that alterations of intestinal microbiota occur in SHS individuals [15]. In the present study, the higher level of ABCG8 mRNA might be associated with the diversity of intestinal microbiota in SHS individuals. Signi cantly lower levels of TUBB3 and calcium/calmodulin dependent protein kinase II β (CAMK2B) mRNA, which involved in the Parkinson's disease and neurodegeneration, were observed in SHS individuals. These ndings indicate that neurodegeneration might be involved in the pathophysiology of SHS.
The PPI network enables the exploration and visualization of functional interactions between the DEGs. As shown in Fig. 5, GJA1, TWIST2, KRT1, TUBB3, AMHR2, BMP10, MT3, BMPER, NTM and TMEM98, were identi ed and selected as critical hub genes. Gap junction protein α1 (GJA1), also known as connexin 43 protein, is protein subunit that constitute gap junction channels [53]. The intercellular channels of gap junction facilitate the transfer of ions and small molecular from cell to cell, and are thought to modulate several processes, including embryogenesis, differentiation, and electrotonic coupling [54]. GJA1 expression is affected by several pathophysiological conditions, such as hypertension, hypercholesterolemia, and diabetes [55]. In the present study, the higher level of GJA1 mRNA in SHS individuals indicates that GJA1 mRNA could be associated with the progression of SHS phenotype towards CVD and T2DM. In addition, Squecco et al. has reported that the bioactive sphingolipid, sphingosine 1-phosphate, can enhance GJA1 protein expression [56]. Our previous study has found sphingolipids metabolism is the disturbed metabolic pathway related to SHS, and signi cantly higher levels of sphinganine 1-phosphate and sphingomyelin are observed in SHS individuals [16]. Given these ndings, the upregulated GJA1 mRNA could be affected by the disturbed sphingolipids metabolism in SHS individuals. These critical genes play hub roles in predictive, preventive, and personalized medicine related to SHS, and be worthy of further investigation.
To establish a relatively accurate diagnosis model for individuals with SHS, a logistic regression analysis was performed based on the transcripts of 10 identi ed hub genes. ROC curve analysis showed that the predictive diagnosis model based on transcriptomic biomarkers can distinguish individuals with SHS from individuals with ideal health status with a sensitivity of 83.3%, a speci city of 90.0%, and an AUC of 0.938. These ndings exhibit strong predictive abilities of transcriptomic biomarkers for SHS diagnosis.
Blood transcripts are potentially objective biomarkers for the SHS diagnosis. The proposed transcriptomic biomarkers have a promising prospect of clinical application in the prediction and prevention of chronic disease.
To the best of our knowledge, this is the rst study to screen transcriptomic biomarkers for SHS using RNA-Seq-based transcriptome pro ling. Nevertheless, several limitations in the present study are noteworthy. First, our study is a case-control study with a relatively small sample size, hence the generalisation of these ndings could be questioned. However, considering the fact that our study provides the original observations on the transcriptomic features of SHS population, the present study has provided a new idea that buffy coat transcripts might offer a novel alternative for the predictive diagnosis, targeted prevention and personalized treatment of chronic diseases. In addition, considering the quantitative accuracy of RNA-Seq technology, a RT-qPCR study is underway against the same cohort to validate the putative transcript biomarkers and selected hub genes based on the ndings in this study. Building on the present ndings, further studies of larger cohorts from diverse geographical areas and populations with different age ranges are warranted.

Conclusions And Expert Recommendations
The early diagnosis of SHS has the potential of predicting chronic diseases at early stage, and effective intervention on SHS may be a cost-effective way for the targeted prevention, and personalized therapy of chronic diseases. For the rst time, the present study identi ed 46 DEGs between the SHS individuals and healthy controls using RNA-Seq-based transcriptome pro ling. A total of 23 transcripts was selected as candidate diagnostic biomarkers for SHS. The present study clearly revealed the potential value of transcriptomic biomarkers for the predictive diagnosis of SHS from the perspective of PPPM. The pattern of the differentially expressed genes can be used as biomarkers for patient strati cation. We suggest that SHS-related transcriptomic biomarkers and other SHS-related biomarkers at multi-omics levels are the key promise for the practice of PPPM of chronic disease.
The downregulation of GCKR in SHS individuals indicated that glucose metabolism disorder plays a signi cant role in the pathophysiologic mechanism of SHS. The GCKR protein could serve as a potentially preventive/therapeutic target for the progression from SHS towards T2DM. We recommend that further studies of large prospective cohorts should be conducted to investigate the crucial role of GCKR in the SHS progression, which could be useful for the predictive diagnosis and targeted prevention of T2DM.
The present study demonstrated that BMPs signalling pathway, ERK 1/2 cascade and ABC transporters, play potential roles in the pathophysiologic mechanism of SHS. These ndings determined the potential utility of SHS-related signalling pathways for targeted prevention and personalized therapy of chronic diseases. We recommend strengthening the studies of signaling pathways in SHS with different omics strategies. Ten genes, including GJA1, TWIST2, KRT1, TUBB3, AMHR2, BMP10, MT3, BMPER, NTM and TMEM98, play hub roles in the underlying mechanisms of SHS. We also suggest that it is crucial to deeply study the functions and activities of these differentially expressed genes in the underly mechanism of SHS.
For the rst time, the present study found the association between neurodegeneration and SHS. These ndings indicated that SHS may precede the actual onset of neurodegeneration diseases, such as Alzheimer's disease and Parkinson's disease. The integration of subjective health measure (SHSQ-25) and objective biomarkers (transcripts of TUBB3 and CAMK2B) enables clinician and public health workers to predict individual's high risk of developing neurodegeneration diseases. SHSQ-25 can be used as an alternative health screening tool in the population-based health survey, particularly when it is lack of laboratory-based resources. Primary healthcare providers must be able to detect and manage SHS to ght delayed diagnosis, untargeted prevention, and ineffective intervention of chronic disease.
Declarations HW and YW participated in the design of the study. HW, QT, JZ, HL, WC, XZ, XL, LW, MS and YK performed participant enrollment and collected the samples. HW, QT and JXZ performed the Transcriptome analysis. HW and QT performed the statistical analysis and drafted the manuscript. YK, WW and YW revised the manuscript.

Funding information
This work was partially supported by the "Hundred-Thousand-Ten Thousand Project (2020A17)" and the