Microarray data
Four datasets (GSE106096, GSE75086, GSE107968 and GSE106748) containing 30 leukemic blast cell samples of AML at diagnosis, 17 leukemic blast cell samples of AML relapse and 3 bone marrow CD34+ cell samples of healthy donors were downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) (Illumina GPL10558 platform, Illumina HumanHT-12 V4.0 expression beadchip; Affymetrix GPL16686 platform, Affymetrix Human Gene 2.0 ST Array; Affymetrix GPL570 platform, Affymetrix Human Genome U133 Plus 2.0 Array). Then, the four datasets were normalized by log2 conversion and merged as three expression matrices (AML at diagnosis versus healthy control, AML relapse versus AML at diagnosis and AML relapse versus healthy control). The probes were converted into the corresponding gene symbol based on the annotation information of the platforms. In addition, the probe sets without corresponding gene symbols were removed and genes with more than one probe set were averaged respectively by using R software (version 3.6.2).
Identification of differentially expressed genes (DEGs)
The study batch effect was adjusted by using Cochran’s Q test for the three expression matrices and the meta-analysis (Combining Effect Sizes method) was adopted to determine the DEGs between groups (a p-value<0.05 was considered statistically significant). The DEGs screened out by meta-analysis through Lima package of R software (version 3.6.2) were further selected by a criterion of |logFC (fold-change)| >1 and adj. P-value <0.01. Then, the Venn diagram was adopted to identify intersection of DEGs from aforementioned three matrices.
KEGG and GO enrichment analyses of DEGs
The Database for Annotation, Visualization and Integrated Discovery (DAVID; http://david.ncifcrf.gov, version 6.8) was employed as a tool to analyze the biological information of DEGs mentioned above (24). Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource for understanding high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technologies and Gene Ontology (GO) is a major bioinformatics tool to annotate genes and analyze biological process of these genes (25, 26). Both of KEGG and GO functions are integrated in DAVID. P<0.05 was considered statistically significant.
PPI network construction and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org, version 11.0) was used to predict and establish the protein-protein interaction (PPI) network of DEGs. By analysis module of STRING, a set of combined scores which represented the interaction strength of the proteins were calculated (low=0.15; medium=0.4; high=0.7; highest=0.9 for instance). An interaction with a combined score >0.4 was considered statistically significant while nodes without connections were ruled out. Then, the Cytoscape software (version 3.7.2) was employed for visualizing molecular interaction networks and the Molecular Complex Detection (MCODE) plug-in of Cytoscape was adopted for identifying the most significant modules in the PPI network based on topology. The criteria for selections was as follows: MCODE scores ≥10, degree cut-off=2, node score cut-off=0.2, Max depth=100 and k-score=2.
Hub genes selection and analysis
After the most significant modules were determined by MCODE, the plug-in cytoHubba of Cytoscape was used to rank the genes by EPC method and genes with degree ≥10 were determined as hub genes.
Patients and ethics statement
For clinical verification, a total of 50 AML patients and 10 healthy donors were enrolled in this study at Children’s Hospital of Soochow University. These patients were newly diagnosed with AML as defined by the World Health Organization (WHO) criteria (27) from October 2013 to October 2015 and were followed up every month and the follow-up endpoint was April 31, 2020. Patients with promyelocytic leukemia, myelodysplastic syndrome-related AML, treatment-related AML and AML of Down syndrome were not included in this study. This study was approved by the hospital ethics committee of Children’s Hospital of Soochow University and written informed consents were obtained from the parents or guardians of all patients and donors.
Quantitative real-time polymerase chain reaction (qRT-PCR)
The bone marrow (BM) samples were collected prior to treatment and mononuclear cells (MNCs) were enriched through Ficoll gradient centrifugation immediately and stored at -80℃. Total RNA of MNCs samples were extracted using TRIzol reagent (Invitrogen) and reverse-transcribed into cDNA using the reverse transcription kit (Takara). The qRT-PCR was employed to measure the levels of mRNAs using the comparative Ct method. GAPDH was considered as the normalization control for mRNA. All primers for qRT-PCR were listed in Supplementary Table 1.
Evaluations and follow-ups
Patients were treated by stratified treatment according to the risk stratification criteria (28-31). Response was evaluated on day 26 of each induction chemotherapy course. Complete remission (CR) was defined as white blood cell (WBC) ≥1.0*109/L, absolute neutrophil count (ANC) ≥0.5*109/L, platelet count (PLT) ≥50*109/L and BM blast cells <5%. The overall survival time (OS) was calculated from the date of diagnosis to the date of death or last follow-up. The events included death, relapse and secondary tumor. The event-free survival time (EFS) was defined as the survival time without those events since the date of diagnosis. Relapse was defined as the recurrence of blasts ≥5% in BM after CR status. The relapse-free survival time (RFS) was calculated from the date of diagnosis to the date of first relapse or last follow-up and the cumulative incidence of relapse (CIS) was calculated.
Gene set enrichment analysis (GSEA)
GSEA is a computational method that assesses whether a set of prior defined genes shows statistically significant and concordant differences between two biological states (32). To investigate the role of PGK1 in AML, GSEA was conducted to analyze the enrichment of datasets between high-expression PGK1 (defined as the mRNA level higher than the median level) and low-expression PGK1 (defined as the mRNA level lower than the median level) groups. False discovery rate (FDR) <25% and nominal p<0.05 were set as the cut-off criterion.
Cell lines and cell culture
The human acute myelogenous leukemia cell lines (HL-60, NB4, U937 and SHI-1 cell lines) and chronic myelogenous leukemia cell line (K562 cell line) were obtained from Jiangsu Institute of Hematology. All these myelogenous leukemia cell lines were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium and supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. The 293FT cell line derived from the primary embryonal human kidney which is suitable for generating lentiviral constructs was purchased (Invitrogen) and cultured in Dulbecco’s modified eagle’s medium (DMEM) containing 10% FBS and 1% penicillin-streptomycin. All the cell lines were cultured in a 37℃ humidified incubator containing 5% CO2.
Transfection of PGK1
Small hairpin RNA for knocking down expression of PGK1 (shPGK1) with a lentiviral vector was transfected into the myelogenous leukemia cells for at least 48h. A lentiviral vector without shPGK1 aliased as shNC was used as a negative control. Detailed schedule was shown in Supplementary Fig.1.
Cell viability evaluation
Transfected myelogenous leukemia cells were seeded in the 96-well plate at a concentration of 2*104 cells/well and incubated at 37℃ for one to five days. 10ul of Cell Counting Kit-8 (CCK-8) was added to each well and incubated at 37℃ for 4h. The cell proliferation was measured by the microplate reader at the absorbance at 450nm of optimal density (OD450nm).
Cell apoptosis assay
Cellular apoptosis was quantified by flow cytometry using Annexin V-FITC (BD Pharmingen) following the manufacturer’s protocol.
Western blotting (WB)
Cells were harvested, washed and lysed in radioimmunoprecipitation assay buffer (RIPA) containing 1mM phenylmethylsulfonyl fluoride (PMSF) and protease inhibitor according to the manufacturer’s protocol (Sigma). The protein concentration was measured with BCA Protein Assay Kit (Pierce). All samples were loaded and electrophoresed on 10-15% sodium salt polyacrylamide gel electrophoresis (SDS-PAGE) gels and transferred onto polyvinylidene fluoride (PVDF) membranes (Millipore). After being blocked with 10% BSA for 2 hours, the membranes were incubated with a specific primary antibody overnight at 4℃, washed with tris-buffered saline and Tween 20 (TBST) and then incubated with a secondary antibody for one hour. Primary antibodies used in this study were as follows: monoclonal anti-PGK1, GAPDH (Abcam), Bax, Bcl-2, Cleaved PRAP, Cleaved Caspase-3 and β-actin (Cell Signaling Technology). For secondary antibodies, horseradish peroxidase (HRP)-conjugated goat anti-mouse or goat anti-rabbit IgG (Cell Signaling Technology) was used. The Pierce Enhanced Chemiluminescence (Thermo Scientific) was applied for blots and the band density was analyzed by using Image Processing and Analysis in Java (Image J, verson 1.8.0) software.
Half maximal inhibitory concentration (IC50) calculation
The transfected myelogenous leukemia cells with shPGK1 or shNC were treated with chemotherapeutic agents (cytarabine [Ara-C] and daunorubicin [DNR]) for 24h and 48h. Afterwards, the cell viability was examined by CCK-8 assay. The drug concentration and corresponding OD450nm values were imported and the IC50 was calculated by SPSS software through a nonlinear regression analysis.
Statistic analysis
The descriptive statistics included the median and range for continuous variables as well as the number and percentage of categorical variables. The independent Student’s t-test was utilized to compare normal distributional variables while the Mann-Whitney U test was used to assess skewed distributional variables. The categorical variables were analyzed using Chi square or Fisher’s Exact Test, as appropriate. The survival functions (CIR, EFS and OS) were described by Kaplan-Meier methods and the log-rank test was used to compare the survival curves. Furthermore, the risk factors of AML were evaluated through Cox regression analysis. SPSS 26.0 software was employed for data processing. Prism 8.0 software was served for results visualization. P<0.05 was considered to be statistically significant.