Identification of water pollutants-related genes
After fully searching the types of water pollutants from CTD database, we got a total of eight types of water pollutant, including perfluorinated chemicals, polybrominated diphenyl ethers, phthalate, nanomaterials, insecticides, microcystins, heavy metal and pharmacologies. Among them, we selected 51 poisons which were closely related to water pollution (pollutants were shown in the figure 1). After the data processing, we obtained the total interaction counts about the water pollutants and interaction genes. Interaction genes with more than 100 interaction counts were considered to be highly related to water pollutants. In order to clarify the biological processes most affected by water pollutants, the genes most commonly affected by all water pollutants (based on the interaction counts collected by CTD) were selected for further analysis. Among the interacting genes which were involved in water pollutants, a total of 77 genes showed more than 100 interaction counts. Among all included genes for further analysis, Peroxisome proliferator-activated receptor alpha, PPARA (with 2102 interaction counts); Tumor necrosis factor, TNF (with 779 interaction counts); Interleukin-1 beta, IL1B (with 553 interaction counts); Caspase-3, CASP3 (with 490 interaction counts); Interleukin-6, IL6 (with 489 interaction counts); Catalase, CAT (with 486 interaction counts); Heme oxygenase 1, HMOX1 (with 406 interaction counts); Copper-transporting ATPase 1, ATP7A (with 385 interaction counts); Prostaglandin G/H synthase 2, PTGS2 (with 333 interaction counts).
GO and KEGG Pathway Enrichment Analysis
A total of 77 most interactive genes were uploaded in R to perform GO and KEGG functional annotation pathway enrichment analysis. and the annotation was selected as Homo sapiens (Figure 2). The results of KEGG pathway enrichment analysis showed that the most enriched terms were related to lipid and atherosclerosis. In addition, the most significantly enriched diseases for these 77 most interactive genes were kaposi sarcoma-associated herpesvirus infection, chagas disease, colorectal cancer, colorectal cancer, hepatitis B, bladder cancer and human cytomegalovirus infection. Based on KEGG pathway enrichment analysis, among all the enriched analysis, the tumors with the highest correlation with water pollutants were colorectal cancer,bladder cancer,pancreatic cancer, prostate cancer and endometrial cancer. Phthalates have been shown to be a key factor in promoting the occurrence of colorectal cancer. Several studies have found that DBP can cause testicular damage and lead to the decrease of the number and quality of sperm. DBP, as a common phthalate, has been proven to show toxic to the male reproductive system. Exposure to DBP during pregnancy or adolescence can cause damage to the testicles, which in turn causes a decrease in sperm count and quality.
The GO functional annotation pathway enrichment analysis of interactive genes revealed that the BPs were most related to response to metal ion, response to oxidative stress, cellular response to metal ion, cellular response to chemical stress, response to drug and response to cadmium ion. Metal ion dyshomeostasis is a major characteristic of Alzheimer's disease. Study suggestsed that the levels of serum metal ion are closely related to Alzheimer's disease, and the clinical severity of Alzheimer's disease patients is related to the level of serum metal ion concentration14. Oxidative stress can induce both apoptosis and cellular senescence. The damage of oxidative stress to macromolecules is significant, because maintaining the integrity of DNA/RNA, proteins and lipids is essential for determining health and disease states Many evidence supports a critical role for oxidative damage to macromolecules in the development of a variety of cancers15.
For CC, the most enriched terms included membrane raft, membrane microdomain, membrane region, vesicle lumen, secretory granule lumen, transcription regulator complex, cytoplasmic vesicle lumen, platelet alpha granule lumen and nuclear envelope. Membrane rafts are heterogeneous and dynamic domains characterized by a close packing of lipids16. The cell membranes of some solid tumors, such as breast and prostate cancer, contain higher levels of cholesterol, which means that larger rafts can be formed in these cell membranes. This may stimulate signaling pathways to promote tumor growth and progression17. Extracellular vesicles, as one of the heterogeneous bilayer membrane vesicles released by all human cell types. In clinical use, extracellular vesicles have become a potential source of biomarkers for urological cancers18.
Heme binding was the most enriched MF term. Tetrapyrrole binding, steroid hydroxylase activity, oxidoreductase activity, monooxygenase activity, DNA-binding transcription factor binding and antioxidant activity were also considered to be significantly enriched MF terms.
Construction of the PPI network and the identification of key genes
Next, in order to find key genes for the further exploration of the roles of interactive genes in water pollutants, cytoscape was used to analyze and construct a PPI network (Figure 3). The PPI network included 106 nodes and 992 edges. The degree value is correlated with node size, and the co-expression value is related to a small edge size. Genes with more than 20 degrees were screened as the biological hub genes. The PPI network of interactive genes showed that JUN, AKT1, TP53, IL6, RELA, MAPK1, MAPK3, FOS, TNF, VEGFA may play an important role in diseases induced by water pollutants. Interestingly, all of these interaction genes and the proteins they encode show strong interconnections, suggesting that a change in transcription/function in one of them may affect others. All these hub genes showed a certain correlation with other genes in the network, revealing that these genes may exert an impact on water pollutants-induced diseases.
Specific analysis of the different type of water pollutants
In addition to the comprehensive analysis of the top 77 genes with the greatest impact on water pollutants (Figure 4). We next carried out the specific analysis according to their industrial classification. A total of eight most common water pollutants were clasified, including perfluorinated chemicals, polybrominated diphenyl ethers, phthalate, nanomaterials, insecticides, microcystins, heavy metal and pharmacologies. On the base of their interaction counts of interaction genes, venn diagram were conducted to analyze the co-interaction genes in different types of water pollutants. Even though we classified the water pollutants according to the chemical species, due to the diversity of constituent chemicals, the number of co-interaction genes in the venn diagram is not expected to be very high. Among perfluorinated chemicals and polybrominated diphenyl ethers, a total of 15 co-interaction genes and 14 co-interaction genes were identified respectively. We obtained a total of 166 co-interaction genes in phthalate. Among heavy metals, a total of 56 co-interaction genes were found. Other water pollutants, nanomaterials, insecticides, microcystins and pharmacologies share very few co-interaction genes. Some water pollutants, such as phthalate and heavy metals, affect many co-interaction genes. Other water pollutants share few genes in common, which reveals that these compounds are not equally toxic, and do not affect a multitude of genes in common.
Integrative analysis of disease-related genes
On the basis of PPI network results, interaction genes with top 10 degrees were explored for further analysis. PPI network demonstrated that JUN, AKT1, TP53, IL6, RELA, MAPK1, MAPK3, FOS, TNF, VEGFA showed the most degrees (Figure 5). In order to further figure out the relation between interaction genes and diseases, we than downloaded the gene-disease interactions data from CTD database. Based on inference score, we focus on the 10 most related diseases of all top 10 interaction genes. The correspondence between interaction genes and diseases was shown in the circle plot. The diseases which were strongly related to water pollutants were mainly enriched in hypertension, inflammation, neoplasm metastasis and neoplasm invasiveness. Our results may indicate that the effects of the water pollutants are predominantly focused on very specific disease pathways. Among all tumor-related diseases, we discovered that colonic neoplasms and breast neoplasms were the most related tumors with water pollution, which is consistent with the results of the PPI network.
Identification of the genes related to colonic neoplasms and breast neoplasms
After analyzing the genes which were related to water pollutants, colonic neoplasms and breast neoplasms demonstrated close interaction. In order to specifically figure out which genes were most related to colonic neoplasms and breast neoplasms, we downloaded the genes which were highly interacted with colonic neoplasms and breast neoplasms from CTD. On the basis of inference score, a total of 32452 genes were proved to relate to colonic neoplasms. We found that TP53 (ranked 5), RELA (ranked 8) and JUN (ranked 12) were the most interaction genes with colonic neoplasms. Based on inference score, a total of 113267 genes were proved to relate to breast neoplasms. TP53 and RELA were found to be the most relative to breast neoplasms. The distribution of TP53, RELA and JUN in human organs were demonstrated in Figure 6.