MMP9 Expression in Pan-cancer
The whole process of data analysis is depicted in Fig. 1.We analyzed the expression data of 26 cancer types and found that MMP9 was highly expressed in the vast majority of tumor samples, and there was a statistical difference in most tumors, including GBM、CESC、LUAD、COAD、COADREAD、BRCA、ESCA、STES、KIRP、KIPAN、STAD、PRAD、UCEC、HNSC、KIRC、LUSC、LIHC、READ、PCPG、BLCA、KICH和CHOL (P < 0.05). Although MMP9 is highly expressed in LGG, CESC, and PAAD, due to the small sample size of the control group (Normal), no significances were uncovered. Besides, for THCA, there was no difference in the expression of MMP9 compared with the normal samples (Fig. 2A) .
Pan-cancer Prognostic Analysis of MMP9
To further explore the association between MMP9 and the prognosis of pan-cancer, we performed prognostic analysis on 39 cancer types. The overall survival (OS)( Supplementary Fig. 1A) results show that for GMBLGG, KIPAN, UVM, LGG, ACC, KIRC, LIHC, BLCA, and TGCT, the higher the expression of MMP9, the lower the survival rate of the tumor (P < 0.05); for SKCM, and SKCM-M, the higher the expression of MMP9, the higher the survival rate of the tumor, which means MMP9 is a beneficial factor for this two tumors (P < 0.05); for the other 28 tumors, there was no significances (P > 0.05). We also plot the survival curves of GMBLGG, KIPAN, UVM, LGG, ACC, KIRC, LIHC, BLCA, and TGCT (Fig. 2B, Supplementary Figs. 1B-C). In addition, we analyzed the progression-free survival (PFS) of pan-cancer (Supplementary Fig. 1D), we found that for GMBLGG, KIPAN, KIRC, UVM, LGG, ACC, THCA, GBM, and KICH, the higher the expression of MMP9, the faster the tumor progression (P < 0.05); for DLBC and OV, the higher the expression of MMP9, the slower the tumor progression was, which means the MMP9 was a suppressor gene of tumor development (P < 0.05); for the other 28 tumors, there was no significances (P > 0.05). To sum up, higher MMP9 expression means both worse survival rate and tumor progression in GMBLGG, KIPAN, UVM, LGG, ACC, and LIHC.
Correlation of MMP9 Expression with TME and Immune Infiltration
The tumor microenvironment (TME) is composed of various components such as immune cells, non-immune stromal cells, and extracellular matrix proteins, including innate immune cells, adaptive immune cells, extracellular immune factors, and cell surface molecules. TME has unique internal interactions and important roles in tumor biology and is known as the tumor immune microenvironment (TIME)[20, 21]. To further explore the correlation between MMP9 and tumor immune infiltration, we performed immune analysis on 6 tumors with MMP9 expression. We found that MMP9 expression in GMBLGG, KIPAN, UVM, LGG, ACC, and LIHC was significantly positively correlated with Immune score, ESTIMATE score, and Stromal score (Fig. 2C, Supplementary Fig. 2).
In addition, we also analyzed the correlation of MMP9 expression with immune cells in each tumor (Fig. 3A). We found that macrophages were significantly associated with MMP9 expression, among which M0 macrophages were significantly positively correlated with all 6 tumors; classically activated M1 macrophages were positively correlated in GMBLGG, KIPAN, UVM, LGG, and ACC; alternative activated M2 macrophages were positively correlated in GMBLGG, and LGG. The high expression of macrophages releases more cytokines (such as EGF), which promotes the metastasis and invasion of cancer cells[22, 23], so this our results provided a new explain of high correlation between MMP9 expression and metastasis. Monocytes were significantly negatively correlated with MMP9 expression in 5 tumors except ACC, which means the ability to recognize and kill tumor cells was inhibited[24]. The activated natural killer cells were negatively correlated with MMP9 expression in GMBLGG, KIPAN, KIRC, and ACC, that means their killing power to tumor cells was decreased when tumors express more MMP9. Besides, MMP9 expression was positively correlated with Tregs (T cell regulatory) in GMBLGG, KIPAN, UVM, LGG, and KIRC, which could suppression of the immune system[25].
Correlation of MMP9 Expression with RNA Modification Genes
RNA chemical modifications have important roles in fundamental cellular processes such as cell differentiation, protein production, cell signaling, and maintenance of circadian rhythms[26, 27], and these modifications can play critical roles as tumor suppressors or tumor promoter effect. We found that GBMLGG was positively correlated with most of the genes in m1A modification, and there was a significant statistical difference; The gene ALKBH3 was significantly and positively associated with MMP9expression in four tumors, GBMLGG, KIPAN, ACC and LGG, with statistically significant differences (Fig. 3B). ALKBH3 can promote the proliferation, migration and invasion of cancer cells[28]. In m6A modification (Fig. 3C), MMP9 expression was positively correlated with most genes in GBMLGG, and there was a significant statistical difference; TRMT61A was significantly and positively correlated with MMP9 expression in four tumors, GBMLGG, KIPAN, ACC and LGG, with statistically significant differences (Fig. 3C). A58 in m1tRNA is composed of RNA-binding component TRMT6 and catalytic component TRMT61A, which is very important for maintaining its stability and affecting translation initiation, and has profound effects on various biological processes[29]. In m5C modification (Fig. 3D), MMP9 expression was positively correlated with most genes in GBMLGG, and there was a significant statistical difference; DNMT3B was significantly and positively correlated with MMP9 in four tumors, GBMLGG, KIPAN, ACC and LGG, with statistically significant differences (Fig. 3D). DNMT3B is considered to be involved in de novo DNA methylation in embryonic stem cells and early embryos. It is overexpressed in several human tumors and is an indicator of early tumor recurrence and poor prognosis in hepatocellular carcinoma[30].
Correlation of MMP9 Expression with TMB
We furtherly performed single nucleotide polymorphism (SNP) analysis by dividing patients into two groups: high MMP9 expression group and low MMP9 expression group. In LGG (Supplementary Fig. 3A), the genes IDH1, TP53, and ATRX have high mutation frequencies (> 20%), and EGFR, MYH13, EPPK1, MYO15A, SI, KIAA1109, CDH17, SLCO1B1, SYNE2, CFAP47, SSPO and ZFHX4 also have more mutation rates and mutation types in high MMP9 expression group. In KIRC (Supplementary Fig. 3B), the genes VHL and PBRM1 have high mutation frequencies (> 20%), and the mutation types are mostly missense mutation, frameshift deletion mutation, nonsense mutation, splice site and in-frame insertion; THSD7B, ADGRV1, XPO7, LAMC2 and UBR4 also have more mutation rates and mutation types in the MMP9 high expression group. TP53, CTNNB1 and MUC16 show high mutation frequencies (> 20%) in ACC (Supplementary Fig. 3C), as well as higher mutation rates in high MMP9 expression group; DST, FAT4, ASXL3, CNTNAP5 and NF1 also have more mutation rates and mutation types in high MMP9 expression group. In UVM (Supplementary Fig. 3D), the genes GNAQ, GNA11, BAP1, and SF3B1 have high mutation frequencies (> 20%), while BAP1 is higher in high MMP9 expression group; SF3B1 and EIF1AX in patients with high MMP9 expression show higher mutation rates.
Construction and validation of pharmacophore model
To further screen novel inhibitors of MMP9, we constructed a ligand-based pharmacophore model. We first considered evaluating the major residues, obtained by analyzing the crystal structures (PDB IDs: 2OW0, 2OW1, 4H3X, and 4WZV) to obtain the major residues of the MMP9 receptor (Fig. 4A, 4B, 4C, and 4D), identifying small active molecules and target proteins Physicochemical interaction patterns between them, mapping them to 3D array features (eg, hydrogen bonds, lipophilic contacts, ionic or aromatic interactions).
As shown in Fig. 4A, the crystal structure 2OWO exhibited two hydrophobic interactions binding with residues TYR423, LEU397, LEU418, VAL398 and ZN444. Two hydrogen bond acceptors with the ALA189, GLN402, HOH503, HOH608 and LEU188. In addition, a positively ionized region was also detected. The crystal structure 2OW1(Fig. 4B) exhibited two hydrophobic interactions binding with residues VAL398, LEU418, TYR423, LEU397 and ZN444. Five hydrogen bond acceptors with the LEU188, HOH593, HOH557, ALA189 and GLN402, as well as three hydrogen bond donors were also observed and a positively ionized region. The crystal structure 4H3X (Fig. 4C) exhibited two hydrophobic interactions binding with residues LEU243, TYR248, VAL223 and ZN301. Two hydrogen bond acceptors with the LEU188 and ALA189, as well as three hydrogen bond donors with ALA189, HIS226 and HOH415 were also observed and a positively ionized region. The crystal structure 4WZV (Fig. 4D) exhibited two hydrophobic interactions binding with residues TYR245, MET247, ZN302, VAL223 and TYR248. Four hydrogen bond acceptors with the ALA191, HOH401, LEU188 and ALA189, as well as three with ALA189, HIS230 and GLU227 hydrogen bond donors were also observed and a positively ionized region. As shown in Supplementary Fig. 4A-D, these drugs all exert the most important effect with amino acid residue H401.
Virtual Screening
We performed a prospective virtual screening (VS) of a database of compounds of natural origin and synthetic drugs, in which we used fitted values as pharmacology-based screening criteria. After removing duplicates, we screened 230 small molecules with the same pharmacophore from 1,752,844 small molecules. Then, we built a deep learning model of MMP9 with 3479 small molecules and validated it. the Accuracy, Precision, and AUC of the model we constructed gradually stabilized with the increase of Epoch, and finally stabilized at around 0.9 (Fig. 4E); Recall and F1 also gradually stabilized around 0.9, and Loss and MCC gradually stabilized around 0.45 and 0.7 respectively (Supplementary Fig. 4E-H). After screened by the machine learning model, 49 small molecules (score = 1) from the 230 small molecules were identified.
ADME and toxicity prediction
Pharmacokinetics is an important analytical method for detecting effective compounds in the process of drug discovery, and the analysis of its properties plays an important role in drug design (Supplementary Table 1). Water solubility predictions (defined in water at 25°C) indicated that 33 compounds were soluble in water. For human intestinal absorption, 21 compounds have good absorption levels. There are 40 compounds were highly bound to plasma proteins, and the rest were the opposite. CYP2D6 is one of the important enzymes involved in drug metabolism, and all 49 compounds were predicted to be non-inhibitors of cytochrome P450 2D6 (CYP2D6). For hepatotoxicity, seven compounds were predicted to be nontoxic. CHEMBL82047 and CHEMBL381163 have good water solubility, intestinal absorption and protein binding, and can act as non-inhibitors of CYP2D6 without hepatotoxicity. In addition to this, we conducted a comprehensive investigation of the safety of these small molecules, the results show that two small molecules, CEMBL82047 and CEMBL381163, are non-mutagenic and are predicted to have less Ames mutagenic, rodent carcinogenic, and developmental toxicity potential than other compounds.
Protein molecular docking
To further study the binding properties of small molecules to proteins, we carried out molecular docking experiments (Fig. 5, Fig. 6A-B, Supplementary Fig. 5A-D). As shown in Table 1, CEMBL82047 and CEMBL381163 have higher binding affinity to the protein compared to the drugs JNJ0966 and MMP-9-IN-1. Supplementary Fig. 5E-F shows the π-dependent interactions and hydrogen bonds performed by the structural calculations. The results of structural calculation studies showed that CEMBL82047 formed 4 pairs of hydrogen bonds with the MMP9 residue acceptor, and in addition, the complex itself formed 4 pairs of π-related interactions with the MMP9 residue acceptor. CHEMBL381163 forms 4 pairs of hydrogen bonds and 7 pairs of π-related interactions with the MMP9 residue acceptor (Tables 2 and 3).
Table 1
COCKER potential energy of compounds.
| COCKER potential energy |
CEMBL82047 | -12.164 |
CEMBL381163 | -11.623 |
JNJ0966 | -6.629 |
MMP9IN1 | -8.618 |
Table 2
Hydrogen Bond Interaction Parameters for Each Compound with MMP9 Residues.
Receptor | Compound | Donor Atom | Receptor Atom | Distances (Å) |
2OW1 | CEMBL82047 | LEU188:H | UNK900:O2 | 1.84939 |
ALA189:H | UNK900:O2 | 2.5637 |
GLN402:HE22 | UNK900:O5 | 2.04371 |
UNK900:H29 | ALA189:O | 2.21685 |
CEMBL381163 | LEU188:H | UNK900:O3 | 2.73564 |
GLN402:HE22 | UNK900:O7 | 1.84541 |
HIS411:HD1 | UNK900:O4 | 2.63353 |
UNK900:H22 | ALA189:O | 2.04562 |
JNJ0966 | LEU188:H | UNK900:N3 | 2.7175 |
UNK900:H1 | MET422:O | 2.01943 |
MMP9IN1 | GLN227:HE21 | UNK900:N2 | 2.6066 |
UNK900:H11 | TYR245:O | 1.90625 |
Table 3
π-Related Interaction Parameters for Each Compound with MMP9.
Receptor | Compound | Donor Atom | Receptor Atom | Distances(Å) |
2OW1 | CEMBL82047 | HIS401 | UNK900 | 4.2886 |
UNK900:C15 | LEU187 | 4.82839 |
UNK900 | LEU188 | 5.32252 |
UNK900 | VAL398 | 4.8719 |
CEMBL381163 | UNK900:H4 | HIS401 | 2.77631 |
UNK900:C1 | LEU397 | 4.16409 |
UNK900:C14 | LEU187 | 4.4138 |
PHE110 | UNK900:C15 | 4.14676 |
HIS411 | UNK900 | 5.04727 |
TYR423 | UNK900:C1 | 4.81758 |
UNK900 | LEU188 | 4.80899 |
JNJ0966 | HIS401 | UNK900 | 4.00463 |
UNK900 | TYR423 | 5.5835 |
ALA189 | UNK900:C9 | 3.89591 |
UNK900:C9 | LEU188 | 4.59029 |
UNK900:C9 | VAL398 | 4.42465 |
UNK900 | LEU188 | 4.21123 |
UNK900 | VAL398 | 4.71999 |
UNK900 | LEU397 | 5.13481 |
UNK900 | LEU418 | 5.36355 |
MMP9IN1 | UNK900 | LEU243 | 5.32463 |
Molecular dynamics simulation
Molecular dynamics simulation is a method for simulating the physical motion trajectories and states of atoms and molecules based on Newtonian mechanics. We build a molecular dynamics simulation module to evaluate the stability of small molecule-protein complexes under natural environment conditions. Figure 6C-D shows the potential energy and RMSD plots for each complex. The trajectories of each complex reached equilibrium, and the potential energy and RMSD of complexes CEMBL82047-MMP9 and CEMBL381163-MMP9 reached a steady state over time. That means the complexes can exist stably in the natural environment.