The data collection and analyzation of 19 m6A methylation regulators from the TCGA database
The gene mutation files and RNA sequencing data (count value) with the corresponding clinical information from ccRCC patients were downloaded from TCGA database (https://portal.gdc.cancer.
gov/repository). Based on the clinical data, we selected the 192 somatic mutation data and 261 RNA transcriptome with early-stage ccRCC (pT1) for inclusion. VarScan2 was used to analyze the somatic mutation data and calculate the tumor mutation burden (TMB). The result was displayed with the R packages “maftools” [18]. RNA sequencing counts in tumor tissues and adjacent relatively normal tissues were standardized and analyzed using the “Desq2” package [19].
Proteomic examination of 27 paired early-stage ccRCC tissues
To assess the reliability of the m6A methylation regulator expression across protein levels, 27 paired early-stage ccRCC tissues were then collected and analyzed by means of high-performance liquid chromatography-mass spectrometry (HPLC-MS).
This study was approved by the Institutional Review Board (IRB) of Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College. Written informed consent was requested as a matter of course and signed by everyone to signify approval to participate. All the patients received the partial or radical nephrectomy and the ccRCC diagnosis was determined by at least two independent specialists for pathological analysis in our center. The UPLC-MS process could be divided into the following steps:
-
sample collection and preparation
A total of 27 paired ccRCC tissues, including 27 paired tumor and non-cancerous tissues (NATs), were collected for further analysis. All tissue samples were collected from the operation room and transferred directly into the -80 ℃ refrigerator for storage until analysis took place.
Approximately 25-120 mg of each cryopulverized renal tumor tissue or NATs were homogenized separately in an appropriate volume of lysis buffer (i.e. 2% SDS, 20mM Tris, Cocktail [1:100 dilution], DNAse [1:100 dilution], RNAse [1:1000 dilution]) by repeated vortexing. Protein concentration was determined using Pierce™ BCA assay protein assay kit (Thermofisher). 100 mg of protein was reduced with 20 mM dithiothreitol (DTT) for 5 mins at 95 ℃ and subsequently alkylated with 50 mM iodoacetamide for 45 min at room temperature in the dark. Protein digestion was carried out using a filter-aided sample preparation technique (FASP) method. Proteins were loaded onto 30-kDa filter devices (Pall, Port Washington, NY, USA). Trypsin (Trypsin Gold, mass spec grade, Promega,WI, USA) was added at an enzyme to protein ratio of 1:50, and samples were incubated at 37 °C overnight.
-
ESI-LC-MS/MS for proteome library generation
The pooled peptide samples of each group were separated by high-pH RPLC columns (4.6 mm × 250 mm, C18, 3 μm; Waters, USA). Each pooled sample was loaded onto a column in buffer A1 (H2O, pH 10). The elution gradient was between 5–30% buffer B1 (90% ACN, pH 10; flow rate, 1 mL/min) for 30 mins. Eluted peptides were collected at one fraction per minute. After lyophilisation, 30 fractions were re-suspended in 0.1% formic acid, before being concatenated into 10 fractions by combining fractions 1, 11, 21, etc.
In order to generate a spectral library, fractions from RPLC were analyzed in the DDA mode. Parameters were set as follows: the MS was recorded at 350–1500 m/z at a resolution of 60,000 m/z; the maximum injection time was 50 ms, the auto gain control (AGC) was 1e6, and the cycle time was 3 sec. MS/MS scans were performed at a resolution of 15,000 with an isolation window of 1.6 Da and a collision energy at 32% (HCD); the AGC target was 50,000, and the maximum injection time was 30 ms.
-
ESI-LC-MS/MS for proteome data-independent acquisition analysis
Digested peptides were dissolved in 0.1% formic acid and separated on an RP C18 self-packing capillary LC column (75 μm×150 mm, 3 μm). The eluted gradient was 5–30% buffer B2 (0.1% formic acid, 99.9% ACN; flow rate, 0.3 μL/min) for 60 mins. For MS acquisition, the variable isolation window DIA method with 38 windows was developed. The specific window lists were constructed based on the DDA experiment of the pooled sample. The full scan was set at a resolution of 120,000 over the m/z range of 400 to 900, followed by DIA scans with a resolution of 30,000; the HCD collision energy was 32%, the AGC target was 1E6, and the maximal injection time was 50 ms.
-
Spectral library generation
To generate a comprehensive spectral library, a pooled sample from each group was processed. DDA data were processed using Proteome Discoverer software (Thermo Scientific, Germany) and searched against the human UniProt database (https://sparql.uniprot.org/) appended with the iRT fusion protein sequence (Biognosys).
A maximum of two missed cleavages for trypsin were used, cysteine carbamidomethylation was set as a fixed modification, and methionine oxidation deamination and +43 on Kn (Carbamyl) were used as variable modifications. Parent and fragment ion mass tolerances were set to 10 ppm and 0.02 Da, respectively. The applied false discovery rate (FDR) cutoff was 0.01 at the protein level. The results were then imported to Spectronaut Pulsar software (Biognosys, Switzerland) to generate the spectral library.
Additionally, DIA data were imported into Spectronaut Pulsar software and searched against the human UniProt database to generate DIA library. The final library was generated by combining DDA and DIA libraries of ccRCC and controls.
DIA–MS data were analyzed using Spectronaut Pulsar (Biognosys, Switzerland) with default settings. All results were filtered with a Q-value cutoff of 0.01 which corresponded to an FDR of 1%. Proteins identified in more than 50% of the samples in each group were retained for further analysis. Missing values were imputed based on the k-nearest neighbour method.
Raw proteomics data were transformed using log2 and then centralized. Wilcoxon signed rank test was implemented with the software R (version 4.1.1).
Selection of m6A RNA methylation regulators
After integrating RNA sequencing and proteomic data, 19 m6A RNA methylation regulators were included for further analysis and included: ALKBH5, CBLL1, ELAVL1, FMR1, FTO, FXR2, HNRNPA2B1, HNRNPC, METTL 14, METTL 3, RBM15, RBM15B, RBMX, RBMXL1, WTAP, YTHDC1, YTHDC2, YTHDF2 and YTHDF3.
Uni-variate survival analysis and establishment of prognostic models based on the 19 m6A methylation regulators
To assess the survival impact of every single m6A methylation regulators on early stage ccRCC, univariate survival analysis was applied. Firstly, pROC package [20] was applied to determine the optimal threshold of every single gene expression and then all the ccRCC patients were divided into two subgroups, namely high and low subgroup. Then Kaplan-Meier survival analysis with log rank test was calculated and displayed with package “survminer” [21].
Aiming to construct a comprehensive prognostic model, all 19 m6A methylation regulators were included into multivariate Cox’s regression analysis with risk scores for each sample being calculated. Prognostic risk scores for ccRCC patients were calculated using the following formula:

with i representing the number of prognostic m6A expressions.
According to the median of risk score, all samples were divided into two subgroups: high risk and low risk, and survival curves and time-dependent ROC curves were generated.
Multivariate Cox analysis and nomogram development for survival predictions
In order to assess the independence of the diagnostic model, some critical clinical parameters of the patients were included into the multivariate analysis, including gender, age, tumor side, pathological T stage, N stage and tissue grade. Furthermore, these factors were applied using the “rms” package [22] to develop a survival time nomogram.
Correlations between the prognostic model and tumor infiltrating immune cells
To assess the relationship between prognostic model and tumor purity, R package “estimate” [23] was used to calculate tumor purity, the scores of stromal cells and the infiltration level of immune cells. To further explore the immune cell composition within the tumor microenvironment, Tumor Immune Estimation Resource (TIMER, http://timer.cistrome.org/) was used to compute the proportion of six kinds of immune cells. Spearman correlation analysis was used to investigate correlations between risk score and immune infiltrating cells. In addition, associations between clinical characteristics, tumor purity and risk scores were analyzed using the standard Kruskal-Wallis test.
Statistical Analysis
All the data analysis and the figures were completed by R (version 4.1.1) as well as corresponding R packages illustrated in the Methods. The non-parametric Spearman rank correlation analysis was used to calculate the correlation coefficient. All the tests were two-sided, and p < 0.05 was regarded as significance.