Integrated transcriptome and proteome analyses identifies novel genes and regulatory networks in intervertebral disc degeneration

Background: Intervertebral disc degeneration is a major cause of symptoms like low back pain and neck pain. Many groups have tried to reveal the regulatory network using either transcriptome or proteome profiling technologies, however, the relationship between these differentially expressed proteins and mRNAs are not elucidated. Since posttranscriptional regulation and other mechanisms may affect the translation of mRNA to protein, a combined transcriptome and proteome study may give more precise data on unveiling important regulatory network and key genes of Intervertebral disc degeneration. Results: In the present study, we used the label-free quantification proteomic approach and identify 656 proteins expressed in either degenerated or normal nucleus pulposus, of which 503 proteins are differentially expressed. Taking advantage of the existing nucleus pulposus transcriptome data, we combine and reanalyze the data and find 105 differentially expressed mRNA between degenerated and normal nucleus pulposus. By comparing these data, only 9 genes show significant changes in both protein and mRNA data, while 6 genes (TNFAIP6, CHI3L1, KRT19, DPT, COL6A2 and COL11A2) show concordant changes in both protein and mRNA level. Further functional analyses show different functions of the altered mRNAs and proteins in degeneration, indicating great difference between protein network and mRNA network. Using the gene co-expression network method, we uncover novel regulatory network and potential genes that may play vital roles in intervertebral disc degeneration by combining protein and mRNA data. Conclusions: This is the first study to identify novel regulatory network of intervertebral disc degeneration using combined analysis of both transcriptome and proteome, which may give new insight into the molecular mechanism of intervertebral disc degeneration.

compared these two data to find the difference and combined them using bioinformatics analyses to find the underlying network between IDD protein and mRNA changes.

Sample collection and Nucleus pulposus cell extraction
Informed consent is provided by the patients and their relatives before obtaining the intervertebral disc tissue in surgery. The experiment was authorized by the ethics committee of Second Military Medical University. Three normal intervertebral disc tissue sample were collected from lumbar trauma patients who underwent spinal fusion with no radiological sign of degeneration (Pfirrmann grade I, n=3, age 45 to 49 years, mean 47 years). And three degenerated disc tissue were collected from diagnosed lumbar herniation patients who underwent disc resection and fusion surgery to relieve symptom (Pfirrmann grade IV-V, n=3, age 46 to 50 years, mean 48 years). MRI T-2 weighted images were collected and the modified Pfirrmann grading system(3) was used to evaluate the degree of IDD.
For cell extraction, NP tissue specimens were washed twice with PBS, then minced and digested with 2U/mL protease in DMEM/F12 (Gibco, Grand island NY, USA) for 30 minutes at 37°C. NP cells were released from the tissues by treating with 0.25 mg/mL type II collagenase (Gibco, Cat. No. 17101-015) for 4 hours at 37°C.The resulting cell suspension was transferred into a 40μm cell strainer (BD Falcon, Becton Dickinson, Franklin Lakes, NJ, USA) and centrifuged at 800 g for 5 minutes. The cell pellets were used subsequently in proteomic analysis or total RNA extraction.

RNA extraction and reverse transcription
RNA was extracted from human nucleus pulposus samples using Trizol (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. Concentration of total RNA was measured at 260 nm with a spectrophotometer (DU-800; Beckman Coulter, Brea, CA).
First strand complementary DNA (cDNA) synthesis was performed with 500ng of total RNA in a 10μL final volume containing 2μL Primer Script RT Master Mix (Takara, RR036A, Japan) and 8μLof RNase-free dH 2 O and total RNA. The reverse transcription procedure was carried out according to manufacturer's instructions.

Quantitative real-time polymerase chain reaction
Real-time PCR for KRT19, COL6A2, DPT, COL11A2, CLIP and CHI3L1 were performed by using SYBR premix Ex Taq™ (Takara Bio Inc., Shiga, Japan) with a Step One Plus real-time PCR system (Applied Biosystems, Foster City, CA) according to the manufacturer's instructions. GAPDH was used to normalize the gene expression of mRNAs. The relative amount of transcripts was calculated according to the comparative 2 -∆∆CT method. All of the primers were synthesized by Invitrogen (Supplementary Data 6). Each experiment was repeated at least three times independently. Statistical significance was assessed by comparing mean values (±SD) using a Student's t test for independent groups and was assumed for *P<0.05.

Label-free proteomic profiling.
For protein extraction, tissues samples were transferred to a 1.5-mL screw capped tube and centrifuged at 10,000×g for 30 min at 4 °C. Next, 100 μL lysis buffer (7 M urea, 2 M thiourea) were added into each sample and then ultrasonic crushed to extract total proteins. Proteins were precipitated with trichloroacetic acid (TCA) for 30 min on ice and centrifuged at 40,000×g for 30 min. Protein concentration was determined using the Qubit fluorescent protein quantification kit (Invitrogen) according to the manufacturer's instructions.
Protein were excised from the preparative tube and destained with 50 mM NH4HCO3 to the final concentration 0.5mg/mL. Following the addition of 100 mmol/L DTT DL-Dithiothreitol to its final concentration of 10 mmol/L, the protein fractions were mixed at 56℃ for 60 min, then diluted 10x with 250 mmol/L IAM Iodoacetamide and kept in dark for 60 min. Finally, the samples were digested with trypsin (substrate to enzyme mass to mass ratio at 50:1) at 37℃ for 12 h. Digested supernatant fractions were pressure-loaded onto a fused silica capillary column packed with 3-μm dionex C18 material (RP; Phenomenex). After desalting, a 5-mm, 300-μm C18 capture tip was placed in line with an Agilent 1100 quaternary HPLC High Performance Liquid Chromatography and analyzed using a 12-step separation. After the elution, they were electrosprayed directly into a micrOTOF-Q II mass spectrometer (BRUKER Scientific) with the application of a distal 180°C source temperature. The mass spectrometer was operated in the MS/MS (auto) mode. Survey MS scans were acquired in the TOF-Q II with the resolution set to a value of 20,000. Each survey scan (50~2,500) was followed by five data-dependent tandem mass (MS/MS) scans at 2HZ normalized scan speed.

Transcriptome data acquiring and bioinformatics analysis.
For IDD transcriptome sequencing data, we downloaded the processed data of GSE70362 from Gene Expression Omnibus (GEO) database (13). Samples were rearrange and combined according to Pfirrmann grade. Here we used Pfirrmann grade I and II samples as normal NP (IVD), and Pfirrmann grade III to V as degenerated NP (IDD) for further analysis.
Unsupervised clustering using average linkage and median centering were sub sequentially performed and visualized with TreeView. Gene ontology (GO) analysis was carried out using annotations in Protein ANalysisTHrough Evolutionary Relationships the sections and they were counterstained with hematoxylin, and imaged using a ZEISS microscope (ZEISS Axio Imager A2, Carl Zeiss microscopy GmbH, Jena, Germany).

The proteome and transcriptome analysis of IDD.
To gain initial knowledge of the proteome and transcriptome of IDD, we first performed label-free proteomic analysis of IDD and normal NP tissue. Three normal NP and three IDD NP were analyzed, and 656 proteins were identified. Among which, 503 proteins were

Functional analysis of differentially expressed genes.
In order to identify the functions of these differentially expressed genes, we performed GO and Pathway analysis of both proteomic and transcriptomic data. By comparing the results we found that protein changes mainly focused at extracellular matrix related metabolism  Dataset 4), which is concordant with previous reports (2,16,17). These findings showed that the changes of either transcriptome and proteome of the degenerated NP cells is indeed related to the extracellular matrix remodeling or degeneration, but there also exist great differences in the enriched gene functions and pathways of the differentially expressed genes between mRNA and protein data. Such differences indicating that the gene interacting network for either transcriptome data or proteome data is sure to be different when considered solely. Thus, a combined analysis is needed to uncover the interacting relationship between the proteomic and the transcriptomic differences.

Gene co-expression network reveal vital genes in IDD.
Here, we utilize the co-expression network method to clarify the key component and the potential connection between these key mRNAs and/or proteins. By comparing the normal NP with degenerated NP data, we constructed a co-expression network for both groups according to the gene's mRNA or protein expression trend (fold change). From the network we can found that the degenerated co-expression network is rather focused, with each key node (gene) inter-connected with another, forming one big network loop (Fig.3A), which indicates these genes may play important roles in disc degeneration. However, the coexpression network for normal NP is rather scattered (Fig.3B)  other key mRNAs are not reported to be related with IDD. Further analyzing the network, we can found ossification related genes, IBSP, SOX4, RSPO3 and OGN are also included in the network, which indicates a vital role of ossification related genes in IDD.

Verification analysis of candidate genes in IDD tissue sample.
To further validate the network, we tested the expression of TNFAIP6, CHI3L1, KRT19, DPT, COL6A2 and COL11A2 in both tissue and cellular level. We found that the expression level of the candidates in tissues are all consistent with the transcriptomic and proteomic sequencing data (Fig. A). Using the primary NP cells, we tested the expressions of these candidate gene and found that the expression of KRT19, COL6A2, DPT, and COL11A2 decreased dramatically after the IL-1β agitation (Fig. B), with only CHI3L1 and TNFAIP6 upregulated, which showed similar results compared to the tissue findings. These results showed that the sequencing data are reliable, and indicating a vital role of these candidate genes in the process of disc degeneration.

Discussion
Intervertebral disc degeneration is a complex and not yet fully understood process, and many groups have been dedicated to unveil its mechanism. Although inflammation and cell apoptosis are well known causes that take part in the development of degeneration, the specific trigger and detailed mechanism that causes such process is still not fully Alternatively, antisense transcripts may instead block the translation of the sense mRNAs without changing the levels of the latter. Thus, studying only the mRNA or the protein level of disc degeneration would certainly gain biased results that affect the discovering the true mechanism.
In this study, we combined the mRNA and protein high through-put profiles to gain insight into the degeneration mechanism. By using the co-expression network construction, we found many critical genes that may play important roles in the degeneration process.
From the degenerated network, we can find that the co-expression network is more condensed, with genes of many different functions interconnected. Such network indicates the degeneration process act in cascades, and all disease related genes could affect each other in an unknown mechanism. We can find many genes such as KRT19, COL6A2, COL11A2, COL1A2, TIMP1, and ITGBL1 are known factors in disc degeneration processes, and many of them participate in the extracellular matrix degeneration process. Since the most eminent change of disc degeneration take part in the extracellular matrix, we define these genes as IDD 'effectors', which serve as the downstream component In the degeneration phenomenon. Among which the Chondroitin Sulfate (CS) related biosynthetic and catalytic enzymes like CHST3, CHST10 and CHPF showed significant changes during the NP degeneration, indicating the important role of CS during the NP degeneration.
However, we can found that all three enzymes showed either mRNA or protein changes, none of which showed significant changes in both forms. Such phenomenon indicated that post-transcriptional regulation may play vital role in the CS biosynthesis regulation, and our previous report has just proved that microRNA is critical in regulating CHSY expression (3), which adds to the credibility of our combined analysis. Another similar phenomenon is that of the 9 genes that showed significant changes in both protein and mRNA data, only 6 of them show concordant changes in both protein and mRNA level. The three genes that showed contradictory changes may also receive post-transcriptional regulation. Thus, assessing the regulatory mechanism of these key genes may help to uncover the complex regulation network of IDD.
Another interesting phenomenon we observed is that among the degeneration network, we found many ossification related genes like IBSP, SOX4, RSPO3 and OGN are also included in the network, which indicates an vital role of ossification related genes in IDD process. It has been reported that Runx2, an important regulator of chondrocyte hypertrophy and ossification, is the only family member normally expressed in the intervertebral disc, and that Runx2 expression is upregulated during degeneration (28). The report also showed that Runx2 overexpression mice developed the phenotypes of both ectopic mineralization and IVD degeneration. These previous results imply that ossification related genes could also play as degeneration regulators in the intervertebral disc. So ossification related IBSP, SOX4, RSPO3 and OGN also affect the IDD process, but the effect and mechanism is yet to be known.

Conclusion
Taken together, our study provides new evidence and insight into the regulatory network of IDD. By combining the mRNA and protein data, we found new key regulators (TNFAIP6, CHI3L Declarations Competing interests: The authors have no conflict of interest to declare.    Data are represented as mean ± SD, and the IVD group served as control, **p< 0.01.