Genome-Wide Lymphocytic mRNA Sequencing to Identify Epithelial-Mesenchymal Transition Mechanism in Silicosis

Lymphocytes are immune cells that play dual roles in the pathogenesis of silicosis. Epithelial-mesenchymal transition (EMT), a vital phenomenon in the pathogenesis of silicosis, is regulated by cytokines, chemokines, and other molecules secreted by lymphocytes; however, the underlying regulatory mechanism is unclear. Here, we investigated the role of lymphocytes in EMT in silicosis. Three patients with silicosis and three healthy controls that underwent pre-job physical examination were recruited; fasting venous blood samples were collected and lymphocytes were separated by Ficoll. High-throughput sequencing technology and bioinformatic analysis were used to identify specic genes and signaling pathways. The results were veried through the detection of related indices of peripheral blood samples. prior dened set of genes shows statistically signicant concordant difference between two biological states. Genes involved in Wnt signaling pathways, Janus kinase/signal transducer and activator of transcription (JAK-STAT) signaling pathways, and cell adhesion molecules were employed to investigate the regulatory role of phosphoinositide 3-kinase (PI3K), integrin beta 1 (ITGB1), and integrin-linked protein kinase (ILK) in these signaling pathways. A GSEA map was employed to assess the enrichment eciency of these models.


Introduction
Silicosis is an occupational lung disease caused by the long-term inhalation of crystalline silica (SiO 2 ) dust. It is characterized with persistent in ammation, broblast proliferation, and excessive collagen deposition, eventually leading to pulmonary interstitial brosis [1] . Epidemiological studies have indicated silicosis as the primary type of pneumoconiosis with the highest morbidity and mortality in China and other developing countries. A total of 23 million workers are exposed to silica dust each year worldwide [2] . In China, about 3182 workers were diagnosed with silicosis from 2002 to 2016, accounting for 12.7% (95% con dence interval [CI]: 10.8%-14.6%) of all workers exposed to dust [3] . Furthermore, Nanchong, a city located in East China, had a silicosis prevalence of 26.71% and suffered from more severe conditions [4] , while Binzhou had a prevalence of 34.02% [5] . These studies suggest that silicosis is more prevalent among silica-exposed populations and has emerged as a serious public health problem threatening the health of relevant practitioners. Therefore, it is necessary to strengthen the understanding about the mechanism underlying silicosis.
Silicotic nodules and diffuse interstitial brosis are speci c morphological changes related to silicosis, wherein brosis is the most signi cant change in pulmonary interstitium [6] . Most studies suggest that brosis is an irreversible process upon initiation of silicosis [7] , serving as a landmark event in silicosis.
Mechanistically, over-proliferation and differentiation of broblasts contribute to pulmonary brosis, a complex biological process [8][9][10][11][12][13] . Epithelial-mesenchymal transition (EMT) was recently shown to play a crucial role in pulmonary brosis [6] . EMT refers to the phenomenon characterized with the loss of polarity of epithelial cells and the disappearance of the tight junction between cells, resulting in their resemblance to mesenchymal cells in terms of morphology and characteristics [14] .
Inhaled silica dust particles get deposited in the lung tissues and may be engulfed by macrophages, resulting in the induction of an immune response, followed by the initiation of apoptosis and the release of in ammatory cytokines. The consequence includes EMT and even brosis of pulmonary tissues. Lymphocytes in human peripheral blood play dominant roles in immune response. Thus, the analysis and interpretation of genetic materials in lymphocytes may re ect the process underlying the development of silicosis.
The rapid development in the eld of molecular biology and computer science allows investigation of the molecular mechanisms underlying silicosis at the genome-wide level [15] . Here, we identify the role of lymphocytes in EMT and investigate the underlying molecular mechanism in silicosis using highthroughput sequencing and bioinformatic tools. We validate the identi ed genes with the physical examination of samples.

Objectives
Three patients with silicosis and three healthy controls that underwent pre-job physical examination at the Xinxiang Occupational Disease Prevention and Control Hospital from September to October 2018 were recruited ( Table 1). The two groups were matched by age, sex, body mass index (BMI), and other baseline characteristics. Inclusion criteria of silicosis were in line with the diagnostic criteria of pneumoconiosis (Diagnostic Criteria of Pneumoconiosis (GBZ 70-2015) as follows: patients diagnosed at Xinxiang Occupational Disease Prevention and Treatment Hospital without active pulmonary tuberculosis or extrapulmonary tuberculosis; without complications other than pulmonary infection, chronic obstructive pulmonary disease, and pulmonary heart disease; and without severe heart, liver, and kidney diseases. Informed consent was provided by all participants. Exclusion criteria were as follows: subjects that failed to meet the inclusion criteria; patients with psychiatric problems; those with severe heart, liver, or kidney diseases; patients with poor compliance that failed to cooperate with the study criteria. The healthy controls were con rmed to have no physical disorders, which may affect the lymphocytes in the peripheral blood.

Isolation of lymphocytes
In brief, 5 mL of fresh blood samples were collected from every subject after 12 h of fasting. All blood samples were treated with the anticoagulant ethylenediaminetetraacetic acid (EDTA), followed by dilution with an equal volume of phosphate-buffered saline (PBS). The samples were centrifuged at 500-1000 ⋅g for 20-30 min at room temperature(20±5℃). The middle layer was transferred to another tube and resuspended in 10 mL PBS. The mixture was centrifuged twice at 250 ⋅g for 10 min and the pellet was resuspended and used for further analysis.
Total RNA extraction and high-throughput sequencing Total RNA was extracted from lymphocytes using the blood total RNA isolation kit according to the manufacture's instruction. RNA concentration was quanti ed using a microplate reader. The extracted RNA was ampli ed using polymerase chain reaction (PCR). Sequencing library was constructed using 150 ng of each recovered PCR product, and the library was analyzed on an Illumina MiSeq platform by Genergy Inc. (Shanghai, China).
Identi cation of differentially expressed genes (DEGs) Gene expression values obtained from high-throughput sequencing were normalized as transcript per million. Boxplot was used to assess the distribution of the detected genes in each sample. The limma test was employed to identify the DEGs between silicosis and control groups. Genes satisfying the following criteria were considered as DEGs: |log 2 (fold change)| ≥ 1, adjusted P-value < 0.05.

Functional annotation of DEGs by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes
(KEGG) analysis DEGs identi ed were subjected to Cytoscape, an analysis tool suitable for genome-wide integrated discovery and annotation that provides typical batch annotation. KEGG (https://www.kegg.jp/), one of the most popular biological information databases, aims to determine advanced functions in biological systems. At the molecular level, KEGG speci cally integrates a large number of practical program database resources from high-throughput experimental technologies. GO is a widely used ontology based on three aspects, namely, cellular component (CC), molecular function (MF), and biological process (BP). Annotation items with a P-value less than 0.05 were selected for further investigation.

Construction of protein-protein interaction (PPI) network and screening of hub genes
The Interactive Gene Retrieval Search Tool was used to construct the PPI network, a visual analytical platform for comprehensive gene expression pro ling and meta-analysis at the protein level (available online at https://www.networkanalyst.ca/). Molecular Complex Detection (MCODE, version 1.5.1), a plugin of Cytoscape, was employed to investigate the hub genes based on the PPI network. The criteria for MCODE analysis were as follows: degree cutoff value, 2; MCODE scores, > 5; max depth, 100; k-score, 2; node score cutoff, 0.2. Genes with a degree cutoff value ≥ 10 were identi ed as hub genes, which were submitted for GO and clustering analysis using OmicShare (http://www.omicshare.com/tools), an open data analysis platform.
Gene set enrichment analysis (GSEA) GSEA is a computational method that determines whether a prior de ned set of genes shows statistically signi cant concordant difference between two biological states. Genes involved in Wnt signaling pathways, Janus kinase/signal transducer and activator of transcription (JAK-STAT) signaling pathways, and cell adhesion molecules were employed to investigate the regulatory role of phosphoinositide 3kinase (PI3K), integrin beta 1 (ITGB1), and integrin-linked protein kinase (ILK) in these signaling pathways. A GSEA map was employed to assess the enrichment e ciency of these models.
Veri cation of identi cation results 89 cases of silicosis diagnosed at the Occupational Disease Prevention and Treatment Center of a coal industry group from 2013 to 2015 were selected as the case group (all tunneling workers). In addition, 94 workers exposed to dust with normal chest radiograph were selected as the control group (all tunneling workers) from nine coal mines. Enzyme-linked immunosorbent assay (ELISA) was used to detect the expression levels of soluble adhesion molecules (SICAM) in the serum samples of subjects. The veri cation of results was undertaken for all the samples obtained.
Statistical analysis SPSS version 24.0 was used for statistical analysis. Baseline characteristics were presented as mean ± standard deviation (SD) for parametric indicators, and differences between silicosis patients and healthy controls were compared using the t-test. Fisher's exact test was used to investigate differences in different silicosis stages. A P-value less than 0.05 was considered statistically signi cant.

Baseline characteristics
We included six subjects in this study, three of which were diagnosed with silicosis. The average age was 52.0 ± 10.8 years (range, 40 to 61), and the exposure period to dust ranged from 13 to 18 years. Other three subjects without dust exposure history were recruited as healthy controls; their average age was 53.7 ± 9.3 years (range, 43 to 60). There was no difference in age between silicosis and control groups (t = 0.20, P = 0.849). The difference in BMI was not statistically signi cant (BMI, 25.4 ± 0.8 versus 25.2 ± 0.2, t = 0.34, P = 0.753). Thus, the differences in baseline characteristics among two groups were excluded, and the patients could be used for genome-wide pro ling and investigation.

Identi cation of differentially expressed mRNAs (DEmRNAs) in silicosis
We separated lymphocytes from venous blood samples and used for genome-wide mRNA pro ling. A total of 80813 genes were identi ed, of which 53169 mRNAs have been previously reported and 10614 mRNAs were predicted to be new. We used the violin diagram and correlation analysis to assess the quality of pro ling data. As shown in Figure 1, the normalized data of all samples showed high homogeneity in median and quartile range, and no outliers were found. Correlation analysis revealed the high correlation between groups, but the samples from different groups showed a relatively low correlation strength. We performed the limma test between silicosis and control groups and identi ed 1915 DEmRNAs; 987 of 1915 mRNAs were upregulated, while 928 mRNAs were downregulated.

Functional annotation of DEmRNAs
To investigate the potential role of these DEmRNAs in the pathogenesis of silicosis, we performed GO and KEGG analyses and found their involvement in biological processes of embryogenesis, angiogenesis, tissue repair, immune response, cell survival, proliferation, apoptosis, cell invasion, cell migration, EMT, and pulmonary brosis ( Figure 2). KEGG analysis highlighted signaling pathways, including JAK-STAT, Wnt, and focal adhesion, that were mainly involved in silicosis initiation.

PPI network construction and module analysis
To investigate the interaction between DEmRNAs at the protein level, those involved in JAK-STAT, Wnt, and focal adhesion pathways were used to construct a PPI network (Figure 3). Consistent with the KEGG enrichment analysis, PPI network showed three functional groups of JAK-STAT signaling pathway, Wnt signaling pathway, and cell adhesion molecules that were associated with the pathogenesis of silicosis. DEmRNAs with an enrichment score > 0.6 were exported and used to construct modules in Cytoscape.
After network tting analysis, three genes, including PI3K, ITGB1, and ILK, were identi ed as the hub genes that may play a dominate role in the pathogenesis of silicosis.
Identi cation of the signaling pathways related to the hub genes in silicosis The signaling pathways derived from the KEGG enrichment analysis were used to investigate the role the PI3K, ITGB1, and ILK in the initiation of silicosis. As shown in Figure 4, 30 signaling pathways were found to simultaneously involve PI3K, ITGB1, and ILK. Of these pathways, JAK-STAT signaling pathway, Wnt signaling pathway, and cell adhesion molecules were identi ed with the lowest P-value.
Targeted analysis of the signaling pathways involving the hub genes GSEA provides a completely new algorithm for the identi cation of the mRNAs of interest. Here, we employed PI3K, ITGB1, and ILK as well as JAK-STAT signaling pathway, Wnt signaling pathway, and cell adhesion molecules to assess the regulatory relationship. As shown in Figure 5, the enrichment scores of PI3K, ITGB1, and ILK toward JAK-STAT signaling pathway, Wnt signaling pathway, and cell adhesion molecules were all statistically signi cant, indicating that PI3K, ITGB1, and ILK may contribute to the pathogenesis of silicosis through JAK-STAT, Wnt, and cell adhesion molecules.

Veri cation of identi cation results
To verify the identi cation results, our team have selected 89 patients with silicosis as the case group and 94 workers exposed to dust with normal chest X-ray as the reference. The social-demographic characteristics and occupational history of subjects were investigated, and the expression levels of SICAM in the peripheral blood were detected (Table 2).  Table 2 shows that SICAM levels were signi cantly different between the case and control group (U = −7.58, P < 0.0001). However, age, length of service, and smoking index were not balanced between the two groups. We conducted a generalized linear model analysis and found that SICAM levels were still signi cantly different between the two groups (X 2 = 15.837, P < 0.0001) after adjustment for the three confounding factors (Table 3). The receiver operating characteristic (ROC) curve was used to validate the diagnostic value for silicosis using serum SICAM level. The area under the curve of 0.762 and a cutoff value of 1487.91 pg/mL provided the optimum diagnostic performance with a sensitivity and speci city of 94.4% and 55.3%, respectively ( Figure 6).

Discussion
While the incidence of pneumoconiosis is decreasing every year worldwide [16][17][18][19] , the number of pneumoconiosis cases is still high in China [20][21][22][23][24] . Silicosis is the main type of pneumoconiosis, accounting for about 90% of all occupational diseases [25] . Studies by Wang Dan and others show that the data reported on occupational diseases is less than 7% of the actual incidence [26] . Silicosis is thought to emerge as one of the major occupational diseases that seriously affects the health of concerned populations.
The pathogenesis of silicosis is complex. The entry of silica dust into the body results in macrophage damage in the early stage, followed by the triggering of the immune system, formation of antigenantibody immune complexes, and development of pathological changes related to silicosis. Lymphocytes in the human peripheral blood are important cellular components of immune response. The analysis and interpretation of genetic material in lymphocyte nucleus may re ect the genesis and development of silicosis. The occurrence of silicosis is affected by several targets. Therefore, it is essential to explore the mechanism underlying silicosis and identify the key reliable targets at the molecular level.
The recent development in biogenetic exploration technologies has facilitated the understanding of DEGs in several diseases to provide ideas for accurate treatment [24,27,28] . For instance, four molecular markers associated with the development of gliomas have been identi ed through biogenetic approaches [24] . A study on hepatocellular carcinoma employed the microarray technology to search and analyze a biological database and nally identi ed 16 hub genes closely related to the development of hepatocellular carcinoma [27] . Another study analyzed the expression levels of 322 immune genes and revealed the enhancement in the immune characteristics of patients with neuroblastoma; this study nally selected eight immune genes as the molecular markers of neuroblastoma [28] . Biotechnology and its application in disease-related gene prediction may provide new cues and support for the selection of DEGs related to silicosis.
In the present study, we used bioinformatic tools to identify the DEGs and target molecular markers from peripheral blood lymphocytes. GEO2R for the analysis of 80813 whole gene transcriptome data sets may help us identify the DEGs between silicosis and non-silicosis samples. With enrichment analysis, we found that these DEGs were mainly associated with JAK-STAT, Wnt, and adhesion molecule signaling pathways with respect to EMT. Among the most important modules of the PPI network, three central genes (PI3K, ITGB1, and ILK) showed the highest score, suggestive of their important roles in the occurrence or development of EMT in silicosis.
ITGB1 is a member of the integrin family, while integrin is a member of the adhesion molecule group [29] . Therefore, we detected the expression levels of SICAM in the serum samples from 89 silicosis patients and 94 dust-exposed workers, and found that the difference was statistically signi cant. After adjustment for age, length of service, and smoking index, the expression levels of SICAM in the serum were still statistically signi cant between the two groups. The results of ROC curve analysis also suggested that SICAM could be used as an index for the early screening of silicosis. Thus, PI3K, ITGB1, and ILK may play an important role in the occurrence and development of EMT in silicosis.
EMT refers to the loss of polarity of epithelial cells and the disappearance of tight junctions between cells under the effect of certain factors, thereby imparting the morphology and characteristics of interstitial cells [14] . The process of pulmonary brosis may be regarded as EMT in tissue repair. The level of Ecadherin protein and the relative expression of E-cadherin gene in silicosis model group decreased, while the levels of alpha smooth muscle actin and vimentin proteins and the relative expression of their RNAs increased, suggestive of the occurrence of EMT during the development of silicosis [30][31][32] .
PI3K is a very important member of the phospholipid kinase family. It has lipid kinase and protein kinase activities. P13K signaling pathway is an important pathway widely involved in cell proliferation, differentiation, cycle regulation, and apoptosis. Activated PI3K may produce the second messenger phosphatidylinositol (3, 4, 5)-trisphosphate (PIP3) and further phosphorylate protein kinase B (AKT) protein. Activated AKT enters the nucleus and induces or inhibits the downstream proteins for the regulation of cell survival and proliferation, apoptosis, angiogenesis, and pulmonary brosis [33][34][35] . Studies have con rmed that the inhibition of PI3K/AKT signaling pathway may signi cantly reduce hepatic stellate cell (HSC) proliferation, alpha-collagen type III mRNA expression, and collagen type III secretion [36] . Over-activation of P13K/AKT/hypoxia-inducible factor-1a (HIF-1a) pathway may affect the normal repair of alveolar epithelial cells through the regulation of cell proliferation and apoptosis, leading to the production of collagen III and formation of pulmonary brosis [37] . Some studies have shown that HIF-1a may be regulated by Snail and beta-catenin signaling pathways to promote the transformation of alveolar epithelial cells to interstitial cells and participate in the formation of pulmonary brosis [38] . These results suggest that the activation of P13K signaling pathway is associated with the process of pulmonary brosis via EMT.
ITGB1 is a member of the integrin family of proteins that are mainly involved in the regulation of cell adhesion and recognition during embryogenesis, hemostasis, tissue repair, immune response and tumor cell metastasis [29] .
As a member of the cell adhesion molecule family, integrins not only function as adhesion molecules but also are also involved in transmembrane connection. These molecules can connect the extracellular matrix-integrin molecule-cytoskeleton protein and participate in the activation of multiple signal transduction pathways. Integrins play an important role in several physiological and pathological processes. The connection of the extracellular matrix with integrin may result in the formation of focal adhesion (FAP). Many cytoskeletal proteins are expressed in FAP, including bronectin (FN), laminin (LN), collagen, and wavin (VN). In addition, many important signal transduction molecules such as FAK, ILK, and PI3K could be activated by the extracellular matrix and integrin, subsequently leading to the activation of a variety of downstream signal transduction pathways regulating cell proliferation, survival, and differentiation.
Integrin is thought to be closely related to the occurrence of EMT. The abnormal secretion of integrin in response to trauma-induced tissue stimulation may promote EMT and development of brosis [39] . Galliher et al. found that integrin α v β 3 promotes EMT by regulating the expression of transforming growth factor-beta (TGF-β) receptor. In the absence of integrin β 3 , TGF-β may not induce EMT [40] . Thus, integrin is an upstream molecule that promotes EMT.
ILK is a serine/threonine protein kinase that binds to the cytoplasmic domain of integrin β 1 and β 3 subunits and participates in integrin, growth factor, Wnt, and TGF-β/Smad signal transduction pathways. ILK also regulates cell growth, survival, cell cycle, EMT, apoptosis, invasion, migration, cancer, and angiogenesis.
Studies have shown that ILK is closely related to EMT of cells and promotes the occurrence and development of EMT. Serrano et al. found that EMT in mammary epithelial cells results in a signi cant increase in the expression of ILK; this transformation process may be effectively reversed by blocking ILK expression [41] . Mao Zhongyi et al. found that the expression of ILK and TGF-β1 increased with the development of liver brosis and positively correlated with the degree of liver brosis [42] . Li Xiaoqin and others con rmed that ILK was closely related to the occurrence and development of renal interstitial brosis, and that its expression may serve as an important reference index to judge the progression of renal diseases and prognosis [43] . Cui Hongshuai's research shows that the level of ILK increases with the severity of pulmonary brosis in mice, suggestive of the close relationship between ILK and occurrence of pulmonary brosis [44] . ILK may reduce the degradation of β-catenin and its transfer to the nucleus. It binds to T-cell factor/lymphoid enhancer-binding factor (TCF/LEF) in the nucleus and promotes the transcription of target genes, thereby promoting EMT [45] . Thus, the activation of ILK is closely related to EMT.
Our study has some limitations. First, the screening of the three key genes related to the development and progression of silicosis is based on bioinformatic technology, which needs further veri cation with in vitro and in vivo experiments. Collection of samples from silicosis and non-silicosis subjects and establishment of a suitable animal model of silicosis are warranted for the veri cation of the mechanism of action of the three genes in lymphocytes. Second, the data set of the transcription group is based on the lymphocytes from three cases of silicosis and three control subjects. The data were collected thrice and analyzed without mixing to obtain maximum and minimum values; the minimum values were used.
In the second phase, we plan to re-derive the transcription group data set according to the conventional method to verify the similarities and differences between the two detection methods.

Conclusions
Dysregulation of PI3K, ITGB1, and ILK in lymphocytes may contribute to EMT via JAK-STAT and Wnt signaling pathways in silicosis. Availability of data and materials The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate all subjects signed an informed consent before being enrolled.      Targeted gene set enrichment analysis of PI3K, ITGB1, and ILK in silicosis.

Figure 6
Screening ROC curve of silicosis by SICAM.