Proteomic Analysis of CRISPR Cas 9 Mediated Mdig Deletion in Triple Negative Breast Cancer Cells

We have identied an environmentally inducible gene, mdig that predicted the overall survival in breast cancer patients. We showed that mdig regulated breast cancer cell growth, motility and invasion partially through DNA and histone methylation. However, we have lacked a comprehensive analysis of the proteomic prole of mdig in triple negative breast cancer cells. We applied mass spectrometry to acquire global proteomic and post translational modication analysis for triple negative breast cancer cells MDA-MB-231 that had mdig deleted via CRISPR Cas 9 gene editing. Using label-free bottom up quantitative proteomics, we compared wildtype control (WT) and mdig knockout (KO) MDA-MB-231 cells and identied the proteins and pathways that are signicantly altered with mdig deletion. The Ingenuity Pathway Analysis (IPA) platform was further used to explore the signaling pathway networks incorporating differentially expressed proteins. chaperones, cytoskeleton, immunity, enzyme modulator, hydrolase, isomerase, ligase, lyase, membrane trac proteins, nucleic acid binding, oxidoreductase, receptor, signaling molecule, storage proteins, structural proteins, surfactants, transcription factor, carrier proteins, transferase, transmembrane receptor regulatory, transporter and proteins categories. had a change in of the magnitude observed for histidine 39 of 60S ribosomal protein L27a there was no apparent global increase in abundance for histidine-oxidized peptides. This does not support a role of mdig as a general histidine oxidatase and reinforces the selective action on the L27a protein. The lysine PTMs di-methylation, tri-methylation and acetylation were also tested for quantitative differences in mdig KO cells compared to WT. Because mdig is a demethylase, increased tri-methylation and decreased di-methylation at mdig target lysine residues could be expected. Our data indicate that di-methyl lysine was more abundant in mdig knockouts compared to wild-type. In addition, our results were suggestive of global changes in abundance for lysine tri-methylated and acetylated peptides (p = 0.104 and p = 0.67 respectively). These that mdig has a global impact on lysine acetylation in addition to its specic de-methylase activity. their functional towards a better understanding of the development of breast cancers. The heterogeneity of the TNBC and lack of effective therapeutic targets with insucient predictive biomarkers the challenges associated with TNBC therapy. transformation changes in protein abundance. these changes at the protein level provides unique protein signatures that effective diagnosis and prognosis. high throughput proteomics study of the TNBC cells a large and rich dataset that has allowed us to stratify systemic differences between the MDA-MB-231cells with and without mdig. The top differentially regulated proteins have been at the protein level and have been found to predict disease prognosis both in breast cancer and TNBC. high expression of STMN1, NAMPT, PLAUR and SOD2 predict poor overall survival high expression of FLNA, MAGED2, RACK1, HYOU1 and RIN1 OS. Within the TNBC patient category, high expression of MAGED2 and STMN1predcited poor OS, RACK1, HYOU1, PLAUR, RIN1 and SOD2 predicted better OS. these proteins may as additional biomarkers in the prognosis of the TNBCs. Mechanistic regulation of these proteins mdig further investigation. Nevertheless, we can see using the current data set that the TNBC protein repertoire displayed in the mdig KO cells indicates that the signaling pathways and metabolic alterations induced in the MDA-MB-231 cell line recapitulates the physiological changes in vivo as inuenced by mdig on the mammary cells. This study provides a bioinformatical insight into the TNBC


Background
Breast cancer is the second leading cause of cancer related deaths in women after lung cancer in the U.S. and as of year 2019, there are more than 3.1 million women with a history of breast cancer. This is an alarming situation as about 1 in 8 women in the U.S. will develop invasive breast cancer during their lifetimes (1). Breast cancer is a clinically heterogeneous and a highly complex disease composed of different biological subtypes. Those include human epidermal growth factor receptor 2 (HER-2), luminal A, luminal B, claudin-low, and basal-like (2); in subtypes HER2, progesterone receptor (PR) and estrogen receptor (ER) the proliferation status as measured by Ki 67 remains the standard predictive and prognostic factors for developing breast cancers (3). Among these subtypes, triple negative breast cancer (TNBC) accounts for 10 to 20% of all breast cancer cases, is highly aggressive and has the worst patient outcome. Lack of a targeted therapy, aggressive metastasis and relapse remain the top factors that make TNBC treatment challenging. Virtually all metastases occur within the rst ve years after diagnosis giving TNBC the worst prognosis (4,5).
Several factors pertaining to genetics, epigenetics, environment and lifestyle are involved in the etiology of breast cancer. Mutations in the BRCA1 and BRCA2 genes, age, endogenous and exogenous exposure to hormones, obesity, alcohol consumption and cigarette smoking are some of the known risk factors (6)(7)(8)(9)(10)(11). Developing an understanding of gene-environment interaction in breast cancer is a promising avenue of research. By studying the environmentally modulated genes that are implicated in breast cancer, valuable information concerning the development and progression of breast cancers will be obtained. We have recently identi ed a gene named as mdig, whose expression status in uences the survival time of the breast cancer patients. High expression of mdig predicted poor overall survival. However, for patients who are lymph node positive, mdig expression is a favorable factor for prolonged overall survival (12).
Interestingly, suppression of mdig in breast cancer cells corresponded to enhanced methylation of DNA and histone suggesting that mdig's demethylase property is a factor in the pathophysiology. These data together can be interpreted to mean that mdig is likely to promote tumor growth in the early stages of cancer but act as a tumor suppressor by inhibiting migration and invasion at the later stages (13). After the initial discovery of mdig from the alveolar macrophages of coal miners exposed to mineral dust under occupational settings (14), several studies demonstrated increased expression of mdig in a variety of human cancers especially cancers of the lung and breast (15). Mdig also has a critical role in cell growth and motility (15), in pulmonary in ammation (16,17) and in immune regulation (18,19). Cellular assays have shown a paradoxical role of mdig in cell proliferation, motility and invasion in lung cancer (20), where mdig being an environmental induced gene is induced upon the exposures to certain environmental agents such as silica, arsenic, and tobacco smoke (21).
Development of TNBC and its related metastasis is a complex phenomenon that is poorly understood.
Moreover, the role of mdig in aggressive breast cancers is still poorly understood. Very little is known about mdig except its in uence on breast cancer cell proliferation, migration, invasion and on DNA/histone methylation. Therefore, identifying key proteins modulated by mdig and the biological pathways operating in development of breast cancers is pivotal. The knowledge gained will help in identifying the novel targets of therapeutic interest. The role of mdig in cancer has been studied for several years but this work presents the rst data that documents changes in global proteomic pro les of mdig depleted cells in breast cancer.
For the present study, we adopted a proteomic approach to analyze the triple negative breast cancer cells MDA-MB-231 that are knocked out for mdig via the CRISPR-Cas 9 gene editing technique. Wild type and knockout MDA-MB-231 clones were processed for high resolution mass spectrometry and the data was analyzed for the differentially expressed proteins. The underlying signaling pathways and prominent post translational modi cations were then evaluated. We have demonstrated signi cant pathways, protein networks and the differential accumulation of critical proteins in the mdig affected cells. EIF2 signaling, the unfolded protein response, upregulation of AKT and ribosomal proteins are interesting ndings. We also report some key proteins such as MAGED2, STMN1, RACK1, HYOU1, PLAUR, RIN1 and SOD2 that might have a role in predicting the overall survival in TNBC patients and are modulated by mdig. Altogether, these results provide a strong basis for a much-needed future research regarding mdig's implication in malignant breast cancers.

Cell culture
The human MDA-MB-231cells were purchased from American Type Culture Collection (Manassas, VA). MDA-MB-231 were cultured in DMEM F-12 medium. Cells were supplemented with 10% FBS and 1% penicillin-streptomycin (Sigma, St. Louis MO) and grown in 37 °C-humidi ed incubators in the presence of 5% CO 2.

Construction of the CRISPR-Cas9 vector
To generate the CRISPR-Cas9 plasmid, mdig CDS sequence was supplied into the CRISPR Design tool (http://crispr.mit.edu/), and single guide RNA (sgRNA) sequence targeting on exon 3 of mdig was selected. The sense and antisense primer sequences are 5'-CACCGAATGTGTACATAACTCCCGC-3' and 5'-AAACGCGGGAGTTATGTACACATTC-3', respectively. Single-stranded sense and antisense primers were annealed to form double-strand oligos in 95 °C for 5 min, and then cooled down to 25 °C for 5 min. Vector pSpCas9-2A-Blast was digested with BpiI (BbsI) restriction enzymes (Thermo Fisher Scienti c, Ann Arbor, MI). sgRNA pairs and linearized vector were ligated by T4 DNA ligase (Thermo Fisher Scienti c) for 10 min at 22°C. Then the ligation product was transferred into DH5α competent E. coli strain (Thermo sher scienti c) according to the manufacture's protocol.
Transfection and colonies selection MDA-MB-231cells, 2.5 × 10^5 /well in 6-well plate were transfected with Lipofectamine 2000 (Thermo Fisher Scienti c) according to the manufacture's protocol. Forty-eight hours after transfection, cells were sub-cultured in 10 cm dish for 24h, followed by 2mg/ml of Blasticidin (Thermo Fisher Scienti c) selection for 2 weeks. Cell colonies were collected for screening of mdig expression by western blotting. Colonies without mdig knockout were used as wild type cells (WT), whereas colonies with successful mdig knockout were designated as knockout (KO) cells.

Western Blotting
Total cellular proteins were prepared by lysing cells via sonication in 1 × RIPA buffer (Millipore, Billerica, MA) supplemented with phosphatase/protease inhibitor cocktail and 1 mM PMSF. Lysed cells were then centrifuged and supernatant isolated as protein, which was quanti ed using the Micro BCA Protein Assay Reagent Kit (Thermo Scienti c, Pittsburgh, PA). Prior to loading onto SDS-PAGE gels, samples were boiled in 4 × NuPage LDS sample buffer (Invitrogen) containing 1 mM dithiothreitol (DTT). Samples were run on SDS-PAGE gels, and separated proteins were then transferred to methanol-wetted PVDF membranes (Invitrogen). Membranes were subsequently blocked in 5% nonfat milk in TBST and probed with the indicated primary antibodies at dilutions of 1:1000 or 1:2500 overnight at 4 °C. The next day, membranes were washed with TBST and incubated with horseradish peroxidase (HRP)-conjugated secondary antibodies at dilutions of 1:2000 or 1:5000 at room temperature for 1 h. Immunoreactive bands were visualized through SuperSignal™ West Pico Chemiluminescent Substrate detection system (Thermo Scienti c, Rockford, IL). Mdig (mouse) antibody was purchased from Invitrogen. uPAR/PLAUR antibody was from cell application Inc, MAGE-D2 from Santacruz, Anti-ORP150, Anti-RIN1, Anti-SOD2, anti-Visfatin and Filamin A were from Abcam. Cathepsin D, RACK1, Stathmin, and Tubulin were from Cell Signaling Technology (Danvers, MA, USA). All presented data are representative of at least three independent experiments.

Experimental Design and Statistical Rationale
To ensure robust detection of differential expression 5 WT and 12 KO clones were analyzed, each in duplicate. To capture variability due to sample prep and analysis, each analysis was considered to be independent for statistical analysis. Moderated t-tests with q-value correction for multiple testing was used to identify differentially expressed proteins.
Source (Thermo Scienti c), and introduced into a Fusion Orbitrap mass spectrometer (Thermo Scienti c). Abundant species were fragmented with collision-induced dissociation (CID) Mass spectrometry data analysis For protein quanti cation and pathway analysis, mass spectrometry raw les were searched against the Uniprot human complete database downloaded 2017.07.14 (20 201 entries) using MaxQuant v1.6.2.10 with the default version of the Andromeda search engine. Match between runs was enabled and just one peptide was required for protein quanti cation. All other parameters were left at their default values including: tryptic cleavage with at most 1 missed cleavage was the protease, methionine oxidation and protein N-terminus acetylation were variable modi cations, cysteine carbamidomethylation was a xed modi cation, fragment ion tolerance was 0.5 Da, precursor tolerance was 20 ppm for the rst search and 4.5 ppm for the second, peptide identi cations were allowed at a 1% false discovery rate as determined by a reversed database. For PTM analyses the same raw les were searched against the same database using Proteome Discoverer v2.3.502 to take advantage of the percolator algorithm for sensitive peptide identi cation. Two independent searches were conducted for histidine oxidation and for lysine di-and trimethylation plus lysine acetylation. For the histidine oxidation search both histidine and methionine oxidation were set as variable modi cations. For lysine acylation analysis, lysine di-methylation, lysine trimethylation, lysine acetylation and methionine oxidation were set as variable modi cations. All other aspects of the Proteome Discoverer searches were the same. Sequest HT was the search engine. Trypsin with at most 1 missed cleavage was the protease. Cysteine carbamidomethylation was set as a xed modi cation. MS1 mass tolerance was set to 10 ppm and MS2 mass tolerance was set to 0.6 Da. For all analyses, peptide spectra matches were accepted at a 1% false discovery rate as determined by a reversed database search. PTMRS (23) was used to assess PTM localization con dence. Peptide area under the curve was used to generate quantitative values.
Statistical analysis used R v3.4.3. Protein abundances were normalized to have the same median and differential abundance between wild type and knock-out samples was determined using a moderated ttest (24) with q-value correction for false discoveries (25). To capture variability due to sample prep and analysis, each sample was considered to be independent for statistical analysis. PTM abundance changes were assessed using the moderated t-test with q-value correction on peptide level data. Bulk changes in PTM abundance were assessed using a permutation test as follows. The mean t-statistic for all peptides bearing that PTM was calculated. Then mean t-statistics for 10000 random draws of the same number of peptides from the entire dataset were calculated. A p-value was calculated as the fraction of draws that had a mean t-statistic more extreme than the PTM mean.

Bioinformatics
Sets of proteins obtained from the MS data were processed using The Database for Annotation, Visualization and Integrated Discovery (DAVID) version 2.0 (http://david.abcc.ncifcrf.gov/home.jsp).
Further, Protein ANalysis THrough Evolutionary Relationships (PANTHER) database v 6.1 (www.pantherdb.org) was used for gene ontology (GO) annotation. QIAGEN's Ingenuity Pathway Analysis (IPA®, QIAGEN Redwood City, http://www.ingenuity.com/) software was used to investigate the functional and canonical pathways that were enriched in the differentially expressed proteins. Proteins that responded to mdig knock out (moderated t-test p < 0.005, n = 8) were submitted to IPA. All proteins identi ed in the study and pathways were considered signi cantly different with p < 0.05.

Kaplan-Meier survival analysis
A Kaplan-Meier survival database that contains survival information of breast cancer patients and gene expression data obtained by Affymetrix HG-U133 microarrays. The probe set for the indicated genes were used that scored to be the best among the other probe sets available by using JetSet best probe detection tool (26). Survival curves resulting in p values of < 0.05 between the gene higher (gene high ) and gene lower (gene low ) groups were considered signi cantly different.

Generation of mdig knockout cells by CRISPR Cas 9
To create mdig knock out cells, human triple negative breast cancer cells, MDA-MB-231, were transfected with pSpCas9-2A-Blast vector containing sgRNA that targets the third exon of the mdig gene. Thereafter blasticidin selection was performed for two consecutive weeks and the colonies obtained were screened for mdig expression by western blot (Fig. 1A). Altogether we obtained 5 WT and 12 KO clones and after screening them for mdig expression at the protein level, we prepared them for proteomic analysis. Each of the WT and KO clones was cultured and analyzed in duplicate. 2 of the 34 samples were removed from further analysis for quality control reasons. 5739 proteins were detected, and 5711 were quanti ed in at least 1 sample. 3569 were quanti ed in all samples.
Principal component (Fig. 1B) and cluster analysis (not shown) indicated some within-group heterogeneity. One KO clone in particular, KO#3, appeared to be more similar to WT samples than to other KOs. The gene knock out for that clone was con rmed by western blot and by the mass spec data. The clones KO#10, KO#3 and WT#5 were removed from the dataset and not used in any further analysis. Protein data for protein quantitative analysis (MaxQuant) and peptide data supporting protein quantitative analysis (MaxQuant) has been shown in Supplementary Table S1 and Supplementary Table  S2 respectively.
Identi cation of the differentially expressed proteins for their class and gene ontology annotation LC-MS/MS data were analyzed to determine the fold change (FC) as a normalized ratio for KO compared to WT control cells. This rst screening of the raw data identi ed a set of proteins for which abundances increase or decrease in the MDA-MB-231 mdig KO cells. The analysis consisted of the unique protein IDs, with their fold change, p value and t statistics as a function of KO/WT. Thereafter the differentially expressed proteins were classi ed based on gene ontology designations such as molecular function, cellular component, and biological process using the PANTHER classi cation system (Fig. 2). A total of 26 protein classes were identi ed at the p < 0.05 level. Those categories are: calcium binding, cell adhesion molecules, cell junction proteins, chaperones, cytoskeleton, immunity, enzyme modulator, hydrolase, isomerase, ligase, lyase, membrane tra c proteins, nucleic acid binding, oxidoreductase, receptor, signaling molecule, storage proteins, structural proteins, surfactants, transcription factor, carrier proteins, transferase, transmembrane receptor regulatory, transporter and viral proteins categories.
Among them, proteins in the nucleic acid binding (PCOO171) class were the most prevalent, encoding 340 genes for this category. According to biological process, most of the proteins belonged to the subcategories of biological adhesion, biological regulation, cell proliferation, biogenesis, cellular process, development process, immune system process, localization, metabolic process, multicellular organismal process, reproduction and response to stimulus. Among these, cellular and metabolic process were highly elevated with increased number of genes assigned to them compared to other subcategories. According to molecular functions, majority of the proteins belonged to functions pertaining to binding, catalytic activity, molecular function regulator, molecular transducer activity, structural molecule activity and transporter activities. Binding and catalytic activity were found to be the highest among the group. Finally, according to cellular components, most of the proteins were localized to the cell junction, cells, extracellular regions, membrane, organelle and protein containing complex. Among them, elevated regions were the proteins belonging to the cellular compartment, organelles and protein containing complex ( Supplementary Fig. 1). These patterns of protein distribution suggest that mdig signi cantly affected the family of proteins that are essential for important biological and molecular processes such as binding, metabolism, immunity, and catalytic activities implicated in triple negative breast cancer. It also warrants a further detailed investigation of the individual genes and protein related to such biological functions manifested in breast cancer.

Canonical pathway analysis reveals key signaling cascades affected by mdig
We identi ed the ten proteins with the greatest magnitude change in abundance in KO over WT MDA-MB-231 cells (Table 1). Once the differentially expressed proteins were identi ed, next step was to query the role of those proteins in the pathogenesis of breast cancer.  (12) Regulation of eIF4 and p70S6K Signaling (31) and Caveolar-mediated Endocytosis Signaling (14) (Fig. 3). The Regulation of eIF4 and p70S6K Signaling and Unfolded Protein Response pathways have been elaborated in Fig. 4 showing the upregulated and downregulated proteins and their cellular localization.
Among the pathways that are overrepresented in mdig KO cells, EIF2 signaling was the topmost canonical pathway found in our analysis. Interestingly, PI3K, and AKT were upregulated with mdig silencing while RAS and eIF4a were downregulated. Previous reports identi ed the PI3K-Akt pathway as an enhancer of the expression of EMT resultant transcription factors such as Snail, Slug, ZEB1 and ZEB2 that promoted the EMT and resulted in an elevation of the cancer cell motility (27,28). This suggests an increased motility potential of breast cancer cells upon the loss of mdig protein. Among the Unfolded Protein Response family of proteins, several heat shock proteins such as Hsp70 and Hsp40 were upregulated while TNF receptor associated factor 2 was downregulated in mdig KO cells. Analyzing the protein pro les belonging to the canonical pathway, Regulation of eIF4 and p70S6K Signaling, revealed a plethora of ribosomal proteins that were upregulated in the KO cells, such as ribosomal protein S16, S8, S9, S26, S2, S15a, S3, S6, S7, S21, S24, S3a, S27a, S10, S17, S20,S23 and S4X-linked. Since ribosome biogenesis is important for cancers; upregulation of ribosomal proteins in response to mdig deletion is a striking observation that needs further investigation. The lamin family of proteins such as lamin A, lamin B, and lamin C were upregulated in the KO cells. Notably, another interesting protein, otillin 1 was found to be upregulated ( Supplementary Fig. 2). Filamin proteins have been implicated in cancer progression while increased levels of otillin 1 promoted cell proliferation, migration, tumorigenicity and lymph metastasis in breast cancer studies (29,30).
These data indicate the important signaling pathways implicated in breast cancer upon mdig knockdown. The individual differentially regulated proteins in the top ve canonical pathways certainly are attractive targets for further investigation where mdig is directly involved in the ribosome biogenesis, and the metastasis of triple negative breast cancers.
Cellular and molecular function gives insight into the differential biology of breast cancer cells affected by mdig IPA-based protein network analysis was performed using all identi ed proteins upon mdig knockdown in TNBC cells. We identi ed 500 molecular and cellular functions associated with mdig deletion. The top ve scoring function categories were evaluated for the predicted effect of mdig deletion on the activation status. Processes that are integral to cell growth and tumorigenesis were found, including: Protein Synthesis (177 associated proteins), RNA Damage and Repair (45 associated proteins), RNA Post-Transcriptional Modi cation (104 associated proteins), Cell Death and Survival (362 associated proteins) and Nucleic Acid Metabolism (74 associated proteins). These processes orchestrate the vital molecular functions such as protein expression, decay of mRNA, processing of rRNA, necrosis and metabolism of nucleic acid component or derivative respectively (Fig. 5A). Among them, an overall increase in protein synthesis and an overall decrease in the RNA post-transcriptional modi cation and cell death & survival were found in the KO category (Fig. 5B). Individual proteins belonging to these molecular and cellular functions with their upregulation and downregulation status have been depicted. Additionally, we found an overall decrease in in ammation with mdig loss (supplementary Fig. 3A). This is interesting as our in vivo studies on mdig knockout mice suggested a decreased in ammatory status of the mice upon silica exposure (17) further corroborating the current results.
The top enriched proteins associated with diseases and the disorders with the most proteins involved belonged to the categories of Cancer, Organismal Injury and Abnormalities, Tumor Morphology, Cardiovascular Disease and Developmental Disorder. The top network identi ed was associated with cancer. This network consists of 458 proteins in our proteomic data set (Fig. 6A). These results suggest the involvement of mdig in regulating the process of transformation in breast cancer. The IPA also predicted the upstream regulatory molecules that are either activated or inhibited on the basis of the observed protein expression changes allowing us to understand the underlying causal network. In our analysis we found the top 5 upstream regulators to be: MYCN, NFE2L2, MYC and TCR. Moreover, MYCN was activated upon mdig knockdown (Fig. 6B). Also, Myc is known as a classical upstream regulator of mdig (31).
Post translational modi cation and disease-based protein network analysis reveal the catalytic activity of mdig in the oxidation and demethylation process PTMs can change the dynamics and a nity aspects of protein-protein interactions and often serve as the basis for modulation of signaling pathways implicated in breast cancer. Epigenetically relevant PTMs such as acetylation and methylation contribute to transcription regulation and have well established roles in cancer.
Mdig catalyzes both histidine oxidation (32) and tri-methyl lysine demethylation (33). Therefore, spectra were searched for histidine oxidation and lysine acylation to quantify their changes in response to mdig knockout ( Table 2). Changes in PTM abundance were assessed using the number of peptides that were signi cant (q < 0.1) and whether the mean t-statistic was different from 0. Mdig catalyzes histidine oxidation at His39 of the 60S ribosomal protein L27a (Uniprot accession: P46776) (32). The tryptic peptide containing His39 from the 60S ribosomal protein L27a was detected in both the oxidized and native forms (Fig. 7A, Supplementary Fig. 3B). The peptide sequence (GNAGGLHHHR) has no residues that can be non-enzymatically oxidized so the oxidized form must be the product of an enzymatic reaction. The abundance of the oxidized form was decreased in mdig KO samples (q = 0.00030, moderated t-test, n = 8). The native, non-oxidized, form was detected only in mdig KO samples demonstrating that the knockout removed a speci c enzymatic activity from the cells. In total, 98 peptides with candidate histidine oxidation sites were quanti ed and 11 were found to be signi cantly different between KO and WT samples (q < 0.1, moderated t-test, n = 8). The mean t-statistic for histidine oxidized peptides was near 0 ( Table 2) indicating that they weren't changed in a uniform direction by mdig knockout. To ensure that methionine oxidation didn't interfere with our analysis we limited the set of histidine oxidized peptides to those that had no methionine residues or that had con dent localization of the oxidation site to histidine by PTMRS (23). The mean t-statistic for that selected group was still approximately 0 (not shown). These data con rm the activity of mdig to catalyze the oxidation of His39. However, they don't provide evidence that mdig oxidizes other His residues outside of 60S ribosomal protein L27a His39.
In addition to catalyzing His oxidation, mdig catalyzes the demethylation of tri-methylated lysine 9 of Histone H3 (33). Hence Lysine di-and tri-methylated and acetylated peptides were evaluated for changes in abundance in response to mdig deletion. Dimethylated lysine containing peptides had an overall increase in abundance in mdig KO samples relative to WT (Table 2). This is supported by the number of dimethyl-lysine peptides that were increased in abundance, 28 vs 14 decreased (q < 0.1, moderated t-test, n = 8), and also by the mean t-statistic for lysine di-methylated peptides that was positive (0.37, p = 0.021, permutation test for difference from 0). The change in abundance was con rmed in a smaller set of 63 very high con dence peptides (percolator posterior error probability, (PEP) < 0.001). Those 63 very highcon dence lysine di-methylated peptides had a greater increase in abundance than the larger set (mean tstatistic of 0.83, p = 0.0011) demonstrating that the increase in abundance was not just limited to low quality peptide identi cations. Tri-methyl lysine and acetylated lysine also had positive mean t-statistics but did not meet our statistical threshold. These results suggest an important regulatory role of mdig on the 60S ribosomal protein L27a and on the methylation of lysine residues on histone proteins; which are likely to affect the transcription of critical genes implicated in TNBC.
PTM peptides that were differentially abundant between KO and WT has been shown in supplementary data  Table S9).
The next level of regulation is the interaction of signaling networks and regulatory pathways. IPA identi ed 25 interaction networks built with 35 focus molecules that were affected by mdig knockdown.
The ve most affected gene networks as determined by IPA and a detailed interaction in the most signi cant networks has been shown in Fig. 7B (Fig. 8A). Five WT clones and twelve KO clones were tested using Tubulin as a loading control. We found a consistent pattern of altered protein expression in the TNBC cells, where CTSD, MAGED2, STMN1 and RACK1 were upregulated in the KO cells and PLAUR, HYOU1, SOD2, RIN1 and NAMPT were downregulated in the KO cells. Except for FLNA, all proteins at western blot assay corroborated with the IPA ndings and hence validated our results. This suggests that these proteins are potential candidate biomarkers for TNBC and are strongly associated with breast cancer. This analysis also demonstrates mdig's regulation on the abundance of these proteins which are implicated in motility, EMT, genomic stability, thereby governing the overall malignant phenotype of aggressive breast cancers.
Evaluation of the identi ed proteins in predicting disease prognosis for the survival of breast cancer patients To explore whether the above proteomics ndings are clinically relevant for breast cancer patients, the top proteins identi ed as differentially abundant in mdig KO cells were evaluated for their performance in predicting disease prognosis and overall survival of breast cancer and TNBC. Survival data from 3951 breast cancer patients and 618 TNBC patients were obtained from an online gene pro ling database ( Kaplan Meir plotter, (34)). Abundance of the top identi ed and validated proteins in the mdig KO cells was evaluated for correlation to patient strati cation based on the high expression of the proteins under study (Fig. 8B). In breast cancer patients, high expression of STMN1, NAMPT, PLAUR, and SOD2 predicted poor overall survival, whereas FLNA, MAGED 2, RACK1, HYOU1, and RIN 1 predicted better overall survival. However, in TNBC patients, high expression of MAGED2, and STMN1 predicted poor overall survival, whereas, RACK1, HYOU1, PLAUR, RIN1 and SOD2 predicted better overall survival. The differential regulation of such proteins by mdig is an important nding. Involvement of these proteins in the regulation of cell proliferation, motility, invasiveness, cancer metabolism and ER stress make them ideal candidates that can be exploited in breast cancer for therapeutic e cacy.

Discussion
Triple negative breast cancer is a malignant form of breast cancer with aggressive clinical characteristics.
It has the worst patient prognosis and currently lacks targeted therapy (35,36). Therefore, there is an urgent need to understand the molecular and biological mechanisms governing the malignant behavior of TNBC and its pathogenicity.
We have recently identi ed a gene named as mdig which predicts poor prognosis in breast cancer (12).
Initially it was identi ed as an oncogene for lung cancer (33) and was also expressed in other cancer type with roles in cell growth and motility (15). In breast cancer we found that high mdig expression predicts poor overall survival of patients, however, predicted a better survival of the patients who had lymph node or distal organ metastasis, further suggesting that mdig is favorable for metastatic patients (12). In breast cancer cells MDA-MB-231, silencing mdig using an siRNA approach, enhanced the DNA and histone methylation and the migration of the cells (13), an important attribute of mdig. These studies indicated that mdig is important for the tumor growth of the early stage breast cancers but at the later advanced stage, mdig expression is likely to bene t the patient as it inhibits the migration and invasion of breast cancer cells.
Cancer is indeed a "disease of pathways" (37) and proteins function through complex biological pathways that involve many proteins working together. The pathways and functions that are over represented in mdig overexpressing cells remain the most likely cause of cancer (38) and hence determination of critical pathways and proteins enriched and altered in healthy vs cancerous cells is essential for our understanding the biological and molecular mechanisms driving the process of carcinogenesis. It is in such scenarios where the state-of-the-art technology like proteomics in conjunction with integrated bioinformatics is essential in identifying the cancer associated signaling pathways and networks and, thereby, assisting in cancer biomarker discovery. The CRISPR/Cas 9 system is an indispensable tool applied in various kinds of human cancers and has been accelerating cancer research (39). In breast cancer, CRISPR technology has enabled breakthroughs in diagnosis, treatment and drug resistance related research (40).
Our previous studies on TNBC cells silenced for mdig via short interfering RNAs have yielded some important information about the regulatory effects of mdig on cell motility and invasion as well as on DNA and histone methylation (13). Though that model is a transient knockdown for mdig, the data generated suggest that mdig negatively regulates breast cancer cell's migration and invasion potential.
The data also show that mdig expression is inversely proportional to the extent of DNA methylation. Still, the mechanisms underlying the in uence of mdig on breast cancer cells are poorly understood and have not previously been explored at the system level that proteomics technology allows.
To D is also overexpressed in breast cancer (41)(42)(43) and predicts a poor prognosis (44)(45)(46). It represents as a marker for invasive potential and aggressive behavior in high grade carcinomas (47) and stimulates the cell growth, angiogenesis and metastasis (48)(49)(50). MAGED2 is found to be elevated in primary tumors and to be upregulated in metastasis (51). It is noteworthy that in this present report we identify MAGED 2 as a novel protein that is increased in response to mdig knockdown. Stathmin 1 was also upregulated in KO cells. STMN1 is a microtubule destabilizing protein whose expression is associated with the breast cancer proliferation (52,53). In breast cancer patients, high STMNI correlates with poor prognosis (54,55). Moreover, in breast cancer patients, elevated STMN1 is linked with high histological grade and low ER, PR expression status (52) and related with aggressive phenotypes accompanied with cancer stem cell marker expression (56). Interestingly, another protein that is associated with cell growth, adhesion invasion and metastasis is the RACK 1 protein which was upregulated in response to mdig knockout. In fact both in vitro and in vivo studies have shown that RACK 1 promotes the proliferation, invasion and metastasis of breast cancer (57) and remains one of the independent predictors for poor clinical outcome in breast cancer (58). Perhaps RACK1 was not only associated with breast cancer malignancy, but its overexpression is implicated in the growth and metastasis of several other cancer types such as lung cancer, gliomas, colon cancer, prostate cancer, liver cancer, epithelial ovarian cancer and squamous cell carcinoma of the esophagus (59). Finally, the top ve proteins found to be upregulated in our analysis included Filamin A. Several studies have reported that the overexpression of Filamin A is associated with highly metastatic cancers of the prostate (60), skin (61) and brain (62) and that FLN A is involved in the progression of neoplasia (63).
Western blot validation of all the upregulated proteins showed a similar trend of increase in the KO cells, with the exception of Filamin A for which we found a decrease in the KO cells. This is quite interesting and is relevant to the metastasis of TNBC cells as reported in the current study. FLNA has dual functions and can promote opposite outcomes depending upon its subcellular localization. In the cytoplasm, FLNA is able to facilitate cell growth and metastasis, however, its presence in the nucleus causes an inhibition of cell growth and metastasis (64). These data indicate that cells with an appropriate amount of FLNA are likely to leverage some bene ts during metastasis and that the abundance of FLNA will in uence the metastasis of cancer cells based on its subcellular localization. In this regard, FLNA inhibition was also found to reduce the metastatic potential of cancer (65) and silencing of FLNA in MDA-MB-231 cells was su cient to inhibit cellular migration and invasion (66).
Among the proteins that were downregulated upon mdig knockout are the families of proteins implicated in cancer metabolism and reprogramming, DNA repair pathways, cell motility, tumor suppressor functions and stress related cellular response. These downregulated proteins were PLAUR, SOD2, RIN 1, HYOU1, and NAMPT.
Increased PLAUR expression has been found in aggressive breast cancers such as in TNBC, a subset of Her 2 + breast cancer, and in tamoxifen refractory breast cancer (67)(68)(69). PLAUR, is known to regulate the ubiquitin proteasome system during DNA damage response and silencing PLAUR impairs the DNA repair process (70). Interestingly in MDA-MB-231 cells and HeLa cells, PLAUR plays a signi cant role in regulating the homologous recombination (HR) DNA repair pathway (71). PLAUR is a potential molecular target for breast cancer owing to its accessibility on the surface of cancer cells (72). Strikingly, we also observed decreased enzyme manganese superoxide dismutase 2 (SOD2) in the KO cells. This is a very signi cant nding since loss of SOD2 represents a phenotype of tumor initiation and therefore an indicative of the tumor suppressor role of SOD2 particularly due its O2• scavenging role during the process of tumorigenesis (73). Apparently, decreased SOD2 activity and hiked up ROS are the prerequisite for the metabolic reprogramming of cancer cells (74). Additionally, forced SOD2 overexpression in cancer cells is able to decrease the metastatic potential and undermine the malignant phenotype of the cancers (75,76). In breast cancer, SOD2 is epigenetically regulated where SOD2 expression is repressed primarily due to the hypo acetylation and hypo methylation of histone proteins thereby inhibiting the functions of transcription factor (77). More-over there is a switch from SOD2 to SOD1 during the transformation process in breast cancers (78) and SOD2 is downregulated in malignant breast cancer cells compared to their normal cell counterparts (79). Mdig is a histone demethylase and hence it is likely to exert its demethylation or hypo methylation activity on the transcriptional status of the SOD2 gene. However, this needs to be further investigated. Mdig KO cells also exhibited a decreased protein RIN1. Notably, RIN 1 downregulation has been associated with invasion and poor overall survival in liver cancer (80) and RIN 1 silencing resulted in increased motility of epithelial cells (81).
In breast cancer, RIN1 expression is decreased in neoplastic tissues as compared to normal breast tissues (82). Since RIN1 contributes to cell motility the decreased expression of RIN1 suggests that mdig in uences the motility and malignant behavior of TNBC cells by downregulating the expression of RIN 1 and other proteins that promote metastasis. Among other downregulated proteins is the HYOU1, a novel HSP implicated in the ER stress response that mediates anti-apoptotic signals in certain cancers such as breast cancer (83), bladder cancer (84) and prostate cancer (85). Another downregulated protein in mdig KO cells is the multifunctional enzyme NAMPT. This protein is usually overexpressed in lymphoma (86) and in solid cancers of prostate, stomach and colon (87)(88)(89). In breast cancer, NAMPT downregulation brought via mir-206 resulted in decreased survival of breast cancer cells (90) and that the expression of NAMPT affects the metastasis and adhesion of breast cancer cells by inhibiting the functions of integrin proteins (91). NAMPT in conjunction with Her 2 and VEGF also serves as a biomarker for the diagnosis and prognosis of human breast malignancies (92). Activation of the unfolded protein response (UPR), confers a resistance to therapy on breast cancer cells and increases the likelihood of recurrence (98). Mdig loss resulted in the enriched UPR pathway suggesting the accumulation of the misfolded proteins due to the impairment of protein folding occurs in the KO cells. Alternatively, this indicates the positive in uence of mdig in the development of endoplasmic reticulum stress and the UPR signaling. The increase in the heat shock proteins with mdig loss suggests the upregulation of the coping mechanisms in the KO cells in response to the EnR stress.
The increased abundance of ribosomal proteins in the canonical pathway pertaining to the regulation of eIF4 and p70S6K signaling in KO cells is a striking observation. Mdig is involved in the ribosomal biogenesis (99) and is implicated in ribosomal RNA transcription (33). Mdig belongs to the family of 2-Oxoglutarate (2OG)-dependent oxygenases' (2OG-oxygenases) that catalyzes the ribosomal protein histidyl hydroxylation where mdig targets the His-39 of Rpl27a within the large (60S) subunit (100). The presence of mdig in the nucleolus is indicative of its critical role in the ribosome biogenesis.
Accumulation of several ribosomal proteins in the KO cells, especially S6, is an indication that mdig modulates expression of ribosomal proteins, a role that might have implications for neoplastic transformation. Also the upregulation of AKT in the KO cells re ects the role of mdig in tumor survival, EMT and metastasis, as AKT activation is a hallmark of several cancers (101).
Mdig has two catalytic activities that change PTMs, histidine oxidation and tri-methyl lysine demethylation (32,33). PTM analysis of the mass spectrometry data identi ed the known target of mdig methylation, tri-methylation and acetylation were also tested for quantitative differences in mdig KO cells compared to WT. Because mdig is a demethylase, increased tri-methylation and decreased di-methylation at mdig target lysine residues could be expected. Our data indicate that di-methyl lysine was more abundant in mdig knockouts compared to wild-type. In addition, our results were suggestive of global changes in abundance for lysine tri-methylated and acetylated peptides (p = 0.104 and p = 0.67 respectively). These results suggest that mdig has a global impact on lysine acetylation in addition to its speci c de-methylase activity.
Metabolic pathways known to be modulated in cancer were also implicated in our analysis. These include the glycolysis, TCA cycle and the Pentose phosphate pathway (PPP) (data not shown can see using the current data set that the TNBC protein repertoire displayed in the mdig KO cells indicates that the signaling pathways and metabolic alterations induced in the MDA-MB-231 cell line recapitulates the physiological changes in vivo as in uenced by mdig on the mammary cells. This study provides a bioinformatical insight into the TNBC associated protein pro les in context of mdig deletion that has laid the foundation for identifying additional pathway speci c biomarkers and their functional implications towards a better understanding of the development of breast cancers. The heterogeneity of the TNBC and lack of effective therapeutic targets along with insu cient predictive biomarkers are reasons for the challenges associated with TNBC therapy. Malignant transformation includes changes in protein abundance. Monitoring these changes at the protein level provides unique protein signatures that might facilitate effective diagnosis and prognosis. The high throughput proteomics study of the TNBC cells has provided a large and rich dataset that has allowed us to stratify systemic differences between the MDA-MB-231cells with and without mdig. The top differentially regulated proteins have been validated at the protein level and have been found to predict disease prognosis both in breast cancer and in TNBC. Among them, high expression of STMN1, NAMPT, PLAUR and SOD2 predict poor overall survival in breast cancer patients whereas, high expression of FLNA, MAGED2, RACK1, HYOU1 and RIN1 predict better OS. Within the TNBC patient category, high expression of MAGED2 and STMN1predcited poor OS, however elevated RACK1, HYOU1, PLAUR, RIN1 and SOD2 predicted better OS. Hence these proteins may serve as additional biomarkers in the prognosis of the TNBCs. Mechanistic regulation of these proteins by mdig needs further investigation. Nevertheless, we can see using the current data set that the TNBC protein repertoire displayed in the mdig KO cells indicates that the signaling pathways and metabolic alterations induced in the MDA-MB-231 cell line recapitulates the physiological changes in vivo as in uenced by mdig on the mammary cells. This study provides a bioinformatical insight into the TNBC associated protein pro les in context of mdig deletion that has laid the foundation for identifying additional pathway speci c biomarkers and their functional implications towards a better understanding of the development of breast cancers.

Conclusions
Current data regarding proteomic changes, post translation modi cation pro les as well as differentially expressed proteins identi ed in the mdig deleted breast cancer cells revealed mdig's regulation on the abundance of crucial proteins which are implicated in EMT, genomic stability and metastasis. Our results on mdig modulated signaling pathways and hub molecules have provided novel targets that can be utilized for the development of treatment strategies and breast cancer therapies. Competing interests, the authors declare that they have no competing interests Funding National Institutes of Health grants R01 ES028263, R01 ES028335, and P30 ES020957 supported the design of this study, collection analysis and assisted in drafting the manuscript and overall needs of this project. National Institutes of Health grants P30 ES020957, P30 CA 022453 and S10 OD010700 supported the mass spectroscopy experiments and proteomics data analysis.
Authors' contributions CT and FC conceived and designed the experiments and drafted the manuscript. CT and QZ carried out the CRISPR Cas-9 knockout assays. NJC carried out the mass spectrometry assay and assisted CT in conducting the bioinformatics and statistical analysis. LX, YF, ZB, WZ, PW, BA participated in the analysis. PMS supervised the proteomics assay and participated in its design and coordination and helped to review the manuscript. All authors read and approved the nal manuscript.