PGK1, a promising biomarker and therapeutic target for patients with acute myeloid leukemia who are susceptible to disease relapse

Background: Glycolysis, a multi-step enzymatic reaction, is considered to be the root of cancer development and progression. The aim of this study is to gure out the glycolytic enzyme, phosphoglycerate kinase 1 (PGK1) whether participate in the progression of acute myeloid leukemia (AML) and its possible mechanisms. Methods: Four datasets (GSE106096, GSE75086, GSE107968 and GSE106748) containing 30 leukemic blast cell samples of AML at diagnosis, 17 leukemic blast cell samples of AML relapse and 3 bone marrow CD34+ cell samples of healthy donors were downloaded from Gene Expression Omnibus (GEO) database and PGK1 was screened out as a potential survival biomarker in AML. Then we did a series of clinical sample verications and gene set enrichment analysis (GSEA) focusing on PGK1. We further knocked down expression of PGK1 in myelogenous leukemia cell lines and explored its potential effects. Results: PGK1 expression was up-regulated among AML at diagnosis versus healthy control, AML relapse versus AML at diagnosis and AML relapse versus healthy control datasets. Through a serial of bioinformatic analyses (differentially expressed genes [DEGs] selection, function and pathway enrichments and protein-protein interaction [PPI] network establishment), PGK1 was identied as the most meaningful gene in AML progression. Furthermore, the generally high expression of PGK1 was conrmed in AML samples comparing with healthy controls in our single center and the high-expression PGK1 was associated with a comparatively low complete remission (CR) rate, a signicantly high 5-year cumulative incidence of relapse (CIR), a poor 5-year event-free survival (EFS) rate, and a poor 5-year overall survival (OS) rate. The GSEA revealed that high-expression PGK1 in AML was associated with many pathways including cytosolic DNA sensing, pentose phosphate, base excision repair and DNA replication. In vitro, the transfected U937 and K562 cells with PGK1 knock-down showed decreased inhibitory in cell score cut-off=0.2,


Background
In spite of accepted standard treatments, about 40-60% of newly diagnosed acute myeloid leukemia (AML) patients and 30-50% of those after allogenetic hematopoietic stem cell transplantation (allo-HSCT) still face the risk of relapse which remains a tough challenge and indicates a poor prognosis (1,2). Due to the intrinsic heterogeneity of AML and acquired drug resistance of tumor cells, current chemotherapy drugs and molecular targeted agents exhibit suboptimal effects (3)(4)(5). The basis of clonal evolution and disease relapse in AML is poorly understood and novel therapeutic targets with wide spectrum are urgently needed.
It is universally acknowledged that cancer cells can switch their metabolic pattern depending on the tumor microenvironment (TME) and preferentially use aerobic glycolysis rather than oxidative phosphorylation (that is the Warburg effect) since it is 10-100 times more rapid of energy generation (6,7). The glycolytic enzymes enable death resistance of shed cancer cells and facilitate tumor recurrence and disease relapse (8,9). Phosphoglycerate kinase 1 (PGK1), as a glycolytic enzyme and a newly identi ed protein kinase, coordinates glycolysis and mitochondrial metabolism (10). PGK1 is one of the only two glycolytic enzymes that can generate adenosine triphosphate (ATP) in the glycolytic pathway (11). In addition, it is demonstrated that PGK1 could be translocated into the mitochondria and act as a protein kinase (10,12). Mitochondrial PGK1 phosphorylates pyruvate dehydrogenase kinase 1 (PDHK1) which could lead to the suppression of pyruvate utilization, reduction of oxidative products, accumulation of lactate and proliferation of tumor cells. Moreover, PGK1 could participate in in ammation, immune response and autophagy with non-metabolic functions and also moonlights as a transcription factor to promote DNA replication (13)(14)(15)(16)(17). Series of studies revealed that PGK1 is overexpressed in a variety of tumors (such as cancers of stomach, colon, breast and liver) (18)(19)(20)(21)(22). The elevated PGK1 expression signi cantly increased the malignancy degree and indicated a poor prognosis (18)(19)(20)23). However, the in uence of PGK1 on AML is rarely evaluated.
In the present study, we rstly performed a bioinformatical analysis and screened out PGK1 as a potential survival biomarker in AML. Then we did a series of clinical sample veri cations and gene set enrichment analysis (GSEA) focusing on PGK1. We further knocked down expression of PGK1 in myelogenous leukemia cell lines and explored its potential effects. We found the high-expression PGK1 was associated with poor prognosis in AML and knocking down PGK1 expression could suppress the myelogenous leukemia cells' proliferation.

Microarray data
Four datasets (GSE106096, GSE75086, GSE107968 and GSE106748) containing 30 leukemic blast cell samples of AML at diagnosis, 17 leukemic blast cell samples of AML relapse and 3 bone marrow CD34+ cell samples of healthy donors were downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) (Illumina GPL10558 platform, Illumina HumanHT-12 V4.0 expression beadchip; Affymetrix GPL16686 platform, Affymetrix Human Gene 2.0 ST Array; Affymetrix GPL570 platform, Affymetrix Human Genome U133 Plus 2.0 Array). Then, the four datasets were normalized by log2 conversion and merged as three expression matrices (AML at diagnosis versus healthy control, AML relapse versus AML at diagnosis and AML relapse versus healthy control). The probes were converted into the corresponding gene symbol based on the annotation information of the platforms. In addition, the probe sets without corresponding gene symbols were removed and genes with more than one probe set were averaged respectively by using R software (version 3.6.2).
Identi cation of differentially expressed genes (DEGs) The study batch effect was adjusted by using Cochran's Q test for the three expression matrices and the meta-analysis (Combining Effect Sizes method) was adopted to determine the DEGs between groups (a p-value<0.05 was considered statistically signi cant). The DEGs screened out by meta-analysis through Lima package of R software (version 3.6.2) were further selected by a criterion of |logFC (fold-change)| >1 and adj. P-value <0.01. Then, the Venn diagram was adopted to identify intersection of DEGs from aforementioned three matrices.

KEGG and GO enrichment analyses of DEGs
The Database for Annotation, Visualization and Integrated Discovery (DAVID; http://david.ncifcrf.gov, version 6.8) was employed as a tool to analyze the biological information of DEGs mentioned above (24).
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource for understanding high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technologies and Gene Ontology (GO) is a major bioinformatics tool to annotate genes and analyze biological process of these genes (25,26). Both of KEGG and GO functions are integrated in DAVID. P<0.05 was considered statistically signi cant.

PPI network construction and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org, version 11.0) was used to predict and establish the protein-protein interaction (PPI) network of DEGs. By analysis module of STRING, a set of combined scores which represented the interaction strength of the proteins were calculated (low=0.15; medium=0.4; high=0.7; highest=0.9 for instance). An interaction with a combined score >0.4 was considered statistically signi cant while nodes without connections were ruled out. Then, the Cytoscape software (version 3.7.2) was employed for visualizing molecular interaction networks and the Molecular Complex Detection (MCODE) plug-in of Cytoscape was adopted for identifying the most signi cant modules in the PPI network based on topology. The criteria for selections was as follows: MCODE scores ≥10, degree cut-off=2, node score cut-off=0.2, Max depth=100 and k-score=2.

Hub genes selection and analysis
After the most signi cant modules were determined by MCODE, the plug-in cytoHubba of Cytoscape was used to rank the genes by EPC method and genes with degree ≥10 were determined as hub genes.

Patients and ethics statement
For clinical veri cation, a total of 50 AML patients and 10 healthy donors were enrolled in this study at Children's Hospital of Soochow University. These patients were newly diagnosed with AML as de ned by the World Health Organization (WHO) criteria (27) from October 2013 to October 2015 and were followed up every month and the follow-up endpoint was April 31, 2020. Patients with promyelocytic leukemia, myelodysplastic syndrome-related AML, treatment-related AML and AML of Down syndrome were not included in this study. This study was approved by the hospital ethics committee of Children's Hospital of Soochow University and written informed consents were obtained from the parents or guardians of all patients and donors.
Quantitative real-time polymerase chain reaction (qRT-PCR) The bone marrow (BM) samples were collected prior to treatment and mononuclear cells (MNCs) were enriched through Ficoll gradient centrifugation immediately and stored at -80℃. Total RNA of MNCs samples were extracted using TRIzol reagent (Invitrogen) and reverse-transcribed into cDNA using the reverse transcription kit (Takara). The qRT-PCR was employed to measure the levels of mRNAs using the comparative Ct method. GAPDH was considered as the normalization control for mRNA. All primers for qRT-PCR were listed in Supplementary Table 1.

Evaluations and follow-ups
Patients were treated by strati ed treatment according to the risk strati cation criteria (28)(29)(30)(31). Response was evaluated on day 26 of each induction chemotherapy course. Complete remission (CR) was de ned as white blood cell (WBC) ≥1.0*10 9 /L, absolute neutrophil count (ANC) ≥0.5*10 9 /L, platelet count (PLT) ≥50*10 9 /L and BM blast cells <5%. The overall survival time (OS) was calculated from the date of diagnosis to the date of death or last follow-up. The events included death, relapse and secondary tumor. The event-free survival time (EFS) was de ned as the survival time without those events since the date of diagnosis. Relapse was de ned as the recurrence of blasts ≥5% in BM after CR status. The relapse-free survival time (RFS) was calculated from the date of diagnosis to the date of rst relapse or last follow-up and the cumulative incidence of relapse (CIS) was calculated.
Gene set enrichment analysis (GSEA) GSEA is a computational method that assesses whether a set of prior de ned genes shows statistically signi cant and concordant differences between two biological states (32). To investigate the role of PGK1 in AML, GSEA was conducted to analyze the enrichment of datasets between high-expression PGK1 (de ned as the mRNA level higher than the median level) and low-expression PGK1 (de ned as the mRNA level lower than the median level) groups. False discovery rate (FDR) <25% and nominal p<0.05 were set as the cut-off criterion.

Cell lines and cell culture
The human acute myelogenous leukemia cell lines (HL-60, NB4, U937 and SHI-1 cell lines) and chronic myelogenous leukemia cell line (K562 cell line) were obtained from Jiangsu Institute of Hematology. All these myelogenous leukemia cell lines were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium and supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. The 293FT cell line derived from the primary embryonal human kidney which is suitable for generating lentiviral constructs was purchased (Invitrogen) and cultured in Dulbecco's modi ed eagle's medium (DMEM) containing 10% FBS and 1% penicillin-streptomycin. All the cell lines were cultured in a 37℃ humidi ed incubator containing 5% CO 2 .

Transfection of PGK1
Small hairpin RNA for knocking down expression of PGK1 (shPGK1) with a lentiviral vector was transfected into the myelogenous leukemia cells for at least 48h. A lentiviral vector without shPGK1 aliased as shNC was used as a negative control. Detailed schedule was shown in Supplementary Fig.1.

Cell viability evaluation
Transfected myelogenous leukemia cells were seeded in the 96-well plate at a concentration of 2*10 4 cells/well and incubated at 37℃ for one to ve days. 10ul of Cell Counting Kit-8 (CCK-8) was added to each well and incubated at 37℃ for 4h. The cell proliferation was measured by the microplate reader at the absorbance at 450nm of optimal density (OD 450nm ).

Cell apoptosis assay
Cellular apoptosis was quanti ed by ow cytometry using Annexin V-FITC (BD Pharmingen) following the manufacturer's protocol.

Western blotting (WB)
Cells were harvested, washed and lysed in radioimmunoprecipitation assay buffer (RIPA) containing 1mM phenylmethylsulfonyl uoride (PMSF) and protease inhibitor according to the manufacturer's protocol (Sigma). The protein concentration was measured with BCA Protein Assay Kit (Pierce). All samples were loaded and electrophoresed on 10-15% sodium salt polyacrylamide gel electrophoresis (SDS-PAGE) gels and transferred onto polyvinylidene uoride (PVDF) membranes (Millipore). After being blocked with 10% BSA for 2 hours, the membranes were incubated with a speci c primary antibody overnight at 4℃, washed with tris-buffered saline and Tween 20 (TBST) and then incubated with a secondary antibody for one hour. Primary antibodies used in this study were as follows: monoclonal anti-PGK1, GAPDH (Abcam), Bax, Bcl-2, Cleaved PRAP, Cleaved Caspase-3 and β-actin (Cell Signaling Technology). For secondary antibodies, horseradish peroxidase (HRP)-conjugated goat anti-mouse or goat anti-rabbit IgG (Cell Signaling Technology) was used. The Pierce Enhanced Chemiluminescence (Thermo Scienti c) was applied for blots and the band density was analyzed by using Image Processing and Analysis in Java (Image J, verson 1.8.0) software.

Half maximal inhibitory concentration (IC50) calculation
The transfected myelogenous leukemia cells with shPGK1 or shNC were treated with chemotherapeutic agents (cytarabine [Ara-C] and daunorubicin [DNR]) for 24h and 48h. Afterwards, the cell viability was examined by CCK-8 assay. The drug concentration and corresponding OD 450nm values were imported and the IC50 was calculated by SPSS software through a nonlinear regression analysis.

Statistic analysis
The descriptive statistics included the median and range for continuous variables as well as the number and percentage of categorical variables. The independent Student's t-test was utilized to compare normal distributional variables while the Mann-Whitney U test was used to assess skewed distributional variables. The categorical variables were analyzed using Chi square or Fisher's Exact Test, as appropriate. The survival functions (CIR, EFS and OS) were described by Kaplan-Meier methods and the log-rank test was used to compare the survival curves. Furthermore, the risk factors of AML were evaluated through Cox regression analysis. SPSS 26.0 software was employed for data processing. Prism 8.0 software was served for results visualization. P<0.05 was considered to be statistically signi cant.

DEGs identi cation
The four gene expression datasets were standardized by log2 conversion and merged as three expression matrices ( Supplementary Fig.2). The DEGs between groups were shown as heatmaps and volcano plots ( Fig.1a and Fig.1b). According to the intersection obtained from the Venn diagram, eventually, 41 common DGEs were identi ed (Fig.1c).

KEGG and GO enrichment analyses of DEGs
Through DAVID website, function and pathway enrichments of DEGs were analyzed. The KEGG pathway results revealed that the DEGs were mainly enriched in glycolysis/gluconeogenesis, carbon metabolism, biosynthesis of amino acids and pentose phosphate pathway (Fig.2a). The biological processes (BP) of DEGs by GO analysis were mainly enriched in immune response, collagen-activated signaling pathway, leukocyte degranulation and myeloid cell activation involved in immune response (Fig.2b). The cellular components (CC) were mainly enriched in tertiary granule, speci c granule, secretory granule and cytoplasmic vesicle lumen (Fig.2c). The molecular functions (MF) were mainly enriched in collagen receptor activity, translation factor activity, RNA binding, IgG binding and immunoglobulin binding (Fig.2d).

PPI network construction and hub genes identi cation
The PPI network of the common DEGs was constructed (Fig.3a) containing 40 nodes and 558 edges except one DEG, which was excluded due to no connection with others. The most signi cant clusters were obtained by using plug-in MCODE of Cytoscape, which contains 32 nodes and 475 edges. These 32 nodes were considered as the hub genes and ranked by the plug-in cytoHubba of Cytoscape (Fig.3b). The depth of color represented the different scores calculated by cytoHubba and the darker of the node, the higher the score was. The names, abbreviations, functions and EPC scores for these 32 hub genes are shown in Table 1.

Expressions of key genes in AML samples
The top eight hub genes (PGK1, PFKL, GPI, ALDOC, TPI1, PFKP, HK2 and ALDOA) were further veri ed by qRT-PCR through AML patients' and healthy donors' samples in our center. PGK1 was much higher expressed in AML samples with a p-value<0.001 (Fig.4a). PFKP also showed signi cant difference between groups with a p-value=0.018 (Fig.4a). While PFKL, GPI, ALDOC, TPI1, HK2 and ALDOA showed no signi cant difference between groups (Fig.4a). Then, the AML patients were divided into two groups (low-expression group and high-expression group) by the median mRNA level of PGK1 and PFKP, respectively. There was no signi cant difference in gender, age at diagnosis, blood cell counts at diagnosis, main types of fusion gene and gene mutation, cytogenetics and French-American-British (FAB) classi cation between high-expression PGK1 and low-expression PGK1 groups (Table 2). With respect to risk strati cation, there were more patients with high risk (n=10) and fewer patients with low risk (n=7) in high-expression PGK1 group comparing patients in low-expression PGK1 group (P=0.019) (  (Fig.4b). While groups between the high-expression and low-expression PFKP showed no statistical signi cance (Fig.4b). In addition, high-expression PGK1 was determined as a risk factor of RFS, EFS and OS through the Cox regression analyses (Supplementary Table 2).

GSEA of PGK1
To better understand the function of PGK1 and signaling pathways activated in AML, we applied GSEA comparing the datasets of low-expression and high-expression PGK1. The results showed the gene sets associated with cytosolic DNA sensing pathway, pentose phosphate pathway, base excision repair and DNA replication pathway were differentially enriched with the phenotype of high-expression PGK1 (Fig.5a) and the gene sets associated with apoptosis, biosynthesis of unsaturated fatty acids, glycosaminoglycan biosynthesis chondroitin sulfate and other glycan degradation were differentially enriched with the phenotype of low-expression PGK1 (Fig.5b).

Transfection of shPGK1
The mRNA and protein level of PGK1 on the ve selected myelogenous leukemia cell lines were detected by qRT-PCR and WB respectively (Supplementary Fig.3). Then, the U937 and K562 cell lines with relatively high expression of PGK1 were chosen for the following transfection experiments. The sequence two (PGK1-sh2) was used for further functional veri cations due to its highest stable transfection e ciency for at least ve to nine days (Fig.6a).

PGK1's effects on cell proliferation and apoptosis
From the CCK-8 assay for more than three days after transfection, decelerated proliferation of shPGK1 treated cells were observed and these demonstrated the effects of PGK1 on the growth and viability of U937 and K562 cells (Fig.6b). To further investigate the cell viability with PGK1 knocked-down, the apoptotic rate was assayed by ow cytometer and apoptosis-indicated proteins were measured by WB. The percentage of apoptotic cells in both two cell lines treated with shPGK1 were signi cantly higher than that of shNC group (12.50±0.54% versus 4.90±0.66%, P<0.001 and 10.62±0.97% versus 4.89±0.79%, P=0.001 for U937 and K562 cells respectively) (Fig.6c). Correspondingly, a clear decrease of Bcl-2 and an increase of Bax were obtained after shPGK1 transfection. The indicators of apoptosis activation, the PARP cleavage and Caspase-3 cleavage were signi cantly increased (Fig.6d).

IC50 calculation
The U937 and K562 cells were treated with different concentrations of chemotherapeutic agents with a combination of shPGK1 or shNC and examined by CCK-8 assay (Fig.6e). A maximal cytogenetic effect on tumor cells was found after the combined therapy of chemotherapeutic agents and shPGK1 ( Fig.6e and Table 3). Surprisingly, the IC50 of DNR on shPGK1 transfected U937 cells showed an approximate 1/7 value of shNC transfected one (1.865μmol/L versus 13.360μmol/L) ( Table 3).

Discussion
AML is characterized with molecular heterogeneity and many patients remain refractory and relapse. So far, little is known about how the clones evolve during disease relapse and progression. It seems that RNA based changes may play a crucial role in the development and progression of AML and also, AML relapse may share a common biological process (33-35).
We identi ed a common biological pattern associated with disease progression covering various AML molecular subtypes through a database mining and bioinformatical analyses in the rst part of the study. A cluster of co-overexpression genes linked to relapse and progression risk in distinct patient subgroups were identi ed. Accordingly, quite a few of the 32 DEGs screened out as hub genes were sorted to be glycolytic enzymes (such as PGK1, PFKL, ALDOC, ENO1 and so on presented in Table 1). In addition, those hub genes were mainly involved in glycolysis/gluconeogenesis, carbon metabolism, biosynthesis of amino acids and pentose phosphate pathway. It could be inferred that metabolic changes especially glycolysis might be associated with disease recurrence and progression in AML. This result was consistent with previous studies, of which, the enhanced glycolysis achieved by multiple tumors was proposed (23, 36-38).
In the present study, PGK1 showed the highest EPC score of 13.198 and was con rmed to be the most signi cantly expressed either in datasets or in clinical sample veri cations. High-expression PGK1 is related to a high incidence of relapse and a poor prognosis suggesting that PGK1 may play an important role in the progression of AML. PGK1 is a glycolytic enzyme which catalyze one of the two ATP producing reactions and also has been identi ed as a moonlighting protein to perform multi-functions involved in DNA replication, immune response, in ammation, autophagy and so on (10, 12-16, 39, 40). Consistently, our GSEA results revealed that high-expression PGK1 in AML were associated with many pathways including cytosolic DNA sensing, pentose phosphate, base excision repair and DNA replication. In addition, the shPGK1 lentivirus vector was constructed and the PGK1 mRNA and protein expression level could be e ciently knocked down in myelogenous leukemia cell lines. The inhibition of cell proliferation and induction of cell apoptosis were observed with shPGK1 compared with negative controls. Basesd on the results above, it could be inferred that PGK1 might in uence the biological progression of AML through multiple functions and pathways.
Moreover, the IC50 was greatly decreased by combination of chemotherapeutic agents and shPGK1 which indicated the role of PGK1 in chemotherapy sensitivity. PGK1 might in uence the inherent and acquired treatment resistance of acute myeloid leukemic cells. The speci c PGK1-invovled mechanism for chemotherapy sensitivity in AML needs further explored.
Several limitations about our study should be considered. Firstly, due to the heterogeneity of AML populations, the DEGs identi ed from the four datasets which were adjusted by study batch effect may not be accurately suitable for individual leukemic blast sample. Secondly, owing to the small size of clinical samples (a total of 50 AML patients and 10 healthy donors), further expanding the sample size to obtain more sturdy results is necessary. Thirdly, we've just explored the initial effects of PGK1 in vitro and the more speci c and in-depth researches are required in the future.
In summary, our study has identi ed the common biological pathways involved in the progression of AML. Several key genes were screened out and further veri ed by clinical samples. PGK1 is one of the most signi cant gene that highly up-regulated in AML and PGK1 may have the potential for predicting the risk of relapse and poor prognosis of AML patients. Inhibition of PGK1 may be utilized in synergy with existing treatments. Further studies focusing on precise mechanism and subgroups bene t most from PGK1-targeted treatment need to be explored.

Conclusions
Our study established a common gene signature involved in AML progression and demonstrated that PGK1 might play an important role in AML progression which could be a novel therapeutic target for AML.

Declarations
Ethics approval and consent to participate This study was approved by the hospital ethics committee of Children's Hospital of Soochow University and written informed consents were obtained from the parents or guardians of all patients and donors.

Consent for publication
Not applicable.

Availability of data and materials
The part of original data for bioinformatical analysis can be found at GEO database (http://www.ncbi.nlm.nih.gov/geo) and the part of the clinical and the experimental data could be obtained from the authors if the editor or reviewer request.

Competing interests
The authors declare that they have no competing interests.