Quantitative analysis of proteomic profiling in constipation colon biopsies

Background Chronic constipation is a common disease and between 2% and 27% of people are suffering from it in the world. Rare studies explore the diversity of genetic polymorphisms and cell metabolisms in constipation. This study provided a first analysis of constipation-related proteomic data. Methods To help elucidate the potential mechanisms responsible for constipation, proteomic profiling of human colon biopsy specimens was performed. Dysregulated proteins in disease tissues compared with normal tissues were characterized from the expression profiles by Liquid chromatography–mass spectrometry and Tandem Mass Tag proteomic methodology and further subjected to pathway analysis to identify altered biological processes and signaling pathways. Results A total of 5,208 proteins were identified, of which 4,522 had quantitative information. All the differentially expressed proteins displayed fold change greater than 1.3 were considered as dysregulated. Specifically, 42 proteins were up-regulated and 23 proteins were down-regulated in constipation samples. Bioinformatics analysis showed that most of the differentially expressed proteins were involved in the cellular process, single-organism process, metabolic process, biological regulation and response to stimulus. Pathway analysis of dysregulated proteins in constipation showed that the up-regulated proteins mainly participated in drug metabolism-cytochrome P450. Conclusions The TMT method followed by mass spectrometric analysis was applied to study the biopsy proteomic profiling alterations in constipation patients. Our results clearly proved that different protein profiles and signaling pathways were involved in constipation patients.

obstruction 2 . Although current therapies like laxatives and fecal microbiota transplantation have improved the clinical management of constipation, a number of patients have limited benefit and need surgical treatment 4,5 . The clinical heterogeneity and therapeutic response suggest that the underlying disease-inducing mechanisms might differ between affected patients 6 . The increasing evidence reveals the genomic features and signaling pathways of colorectum 7-9 , however rare studies explore the diversity of genetic polymorphisms and cell metabolisms in constipation.
Proteomic profiling in constipation in colon may be translated into novel and effective treatments.
The etiology of constipation remains a conundrum. Transcriptomic and Proteomic Profiling in other bowel diseases, such as inflammatory bowel disease, colorectal cancer, have made considerable achievements, giving our understanding of the pathogenesis 8,9 . Understanding of the proteomics involved in constipation allows us to further understand the pathogenesis. The sequencing technology have succeeded to expanded our vision nowadays. As protein plays a fundamental role in coupling genotypes to phenotypes, measurement of protein abundance is important 10 . Traditional proteomics 11 were limited to serum and blood samples as clinical colon samples are more difficult to acquire. Liquid chromatography-mass spectrometry (LC-MS) and Tandem Mass Tag (TMT) proteomic methodology were optimized to handle large tissue samples. It is reported that TMT quantitative proteomics were used to screen for diagnostic and prognostic protein biomarkers 12,13 . Hereby, we examined the differential expression of proteins in constipation colon biopsies and non-constipation ones by using TMT-based quantitative proteomics. Our study that provided a first analysis of constipation-related proteomic data may provide a more essential understanding of the information flow from protein to phenotype.

Sample collection
The study enrolled 20 inpatients in the Tenth People's Hospital of Tongji University. The patients (n = 12) who had refractory mixed constipation or constipation with megacolon and ileus received subtotal colectomy combined with modified Duhamel procedure 3 and those (n = 8) who had left colon cancer received left hemicolectomy. After operation, fresh colon biopsies were taken from left colon solid specimen, excluding cancer or ulcer tissue. The most homogeneous samples were selected for proteomic analysis. All biopsy samples were collected and processed by standard operating procedures to reduce deviation.

Protein preparation and TMT labeling
The biopsy samples which were taken from − 80 °C were grinded by liquid nitrogen into cell powder and transferred to a 5-mL centrifuge tube. Then, 4 volumes of lysis buffer (8M urea, 1% Protease Inhibitor Cocktail) was added to the cell powder, followed by sonication three times on ice using a high intensity ultrasonic processor (Scientz). After that, The remaining debris was removed by centrifugation at 12,000 g at 4 °C for 10 min. Finally, the supernatant was collected and the protein concentration was determined with BCA kit according to the manufacturer's instructions.
The protein solution was reduced with 5 mM dithiothreitol for 30 min at 56 °C and alkylated with 11 mM iodoacetamide for 15 min at indoor temperature in darkness for digestion. The protein samples were then diluted by adding 100 mM TEAB to urea concentration less than 2M. Trypsin was then added at 1:50 trypsin-to-protein mass ratio for the first digestion overnight and 1:100 trypsin-toprotein mass ratio for a second 4 h-digestion. After that, peptides were desalted by Strata X C18 SPE column (Phenomenex) and vacuum-dried. Peptides were reconstituted in 0.5M TEAB and processed according to the manufacturer's protocol for TMT kit. Finally, The peptide mixtures were incubated for 2 h at indoor temperature and then pooled, desalted and dried by vacuum centrifugation.
Hplc Fractionation And Lc-ms/ms Analysis Tryptic peptides were fractionated into fractions by high pH reverse-phase HPLC using Agilent 300Extend C18 column (5 µm particles, 4.6 mm ID, 250 mm length). Firstly, the peptides were separated with a gradient of 8-32% acetonitrile (pH 9.0) over 60 min into 60 fractions. Secondly, they were combined into 18 fractions and then dried by vacuum centrifuging.
The tryptic peptides were dissolved in solvent A (0.1% formic acid) and then separated by EASY-nLC 1000 UPLC system. Solvent A was an aqueous solution which contained 0.1% formic acid and 2% acetonitrile, while solvent B was an aqueous solution containing 0.1% formic acid and 90% acetonitrile. The gradient setting was: 0 ~ 40 min, 6%-18% B; 40 ~ 52 min, 18%-28% B; 52-56 min, 28%-80% B; 56 ~ 60 min, 80% B, all at a constant flow rate of 300 nL/min. The peptides were subjected to NSI source followed by tandem mass spectrometry in Orbitrap Fusion™ Lumos coupled online to the UPLC. The electrospray voltage applied was 2.4 kV. The m/z scan range was 350 to 1550 for full MS scan and intact peptides were detected in the Orbitrap at a resolution of 60,000. Then, the peptides were selected for MS/MS at a fixed starting point of 100 m/z and the fragments were detected in the Orbitrap at a resolution of 30,000. A data-dependent acquisition procedure was used in the data acquisition mode. After the MS scan, the top 10 peptide parent ions with the highest signal strength were successively selected to enter the HCD collision cell for fragmentation in 32% of the fragmentation energy, and were then tested by the MS/MS scan. In order to improve the effective utilization of mass spectrum, automatic gain control was set to 5E4, signal threshold was set to 20000 ions/s, maximum injection time was set to 100 ms and dynamic exclusion time of tandem mass spectra scan was set to 30 s to avoid repeated scanning of parent ions.

Data analysis
The resulting MS/MS data were processed by using Maxquant search engine (v.1.5.2.8). The database was SwissProt Human (20317 sequences) and the inverse library was added to calculate the false positive rate caused by random matching. And the common contamination library was added in the database to eliminate the influence of contaminated proteins in the identification results. Trypsin/P was specified as cleavage enzyme which allowed up to 2 missing cleavages. The minimum length of the peptide was set as 7 amino acid residues. The maximum modification number of the peptide was set to 5. The mass tolerance for precursor ions was set as 20 ppm in First search and 5 ppm in Main search, and the mass tolerance for fragment ions was set as 0.02 Da. Carbamidomethyl on Cys was specified as fixed modification and oxidation on Met and acetylation on protein N terminal were specified as variable modifications. The quantitative method was set as TMT-10plex and the FDR for protein identification and PSM identification was adjusted to 1%.
The quality control test results of MS data were showed in Fig. 1. Firstly, mass errors of all identified peptides were detected ( Fig. 1.A). The quality error takes 0 as the axis and concentrates on the range below 10 PPM, indicating that the quality error meets the requirements. Secondly, the length of most of the peptides was distributed between 8-20 amino acid residues ( Fig. 1.B), which was in line with the rules of trypsin digestion of peptides, demonstrating that the sample preparation reached the standard.

Bioinformatics Analysis
Gene Ontology (GO) annotation proteome came from the UniProt-GOA database (www. http://www.ebi.ac.uk/GOA/). Firstly, converting identified protein ID to UniProt ID and then mapping to GO IDs by protein ID. If some identified proteins were not annotated by UniProt-GOA database, the InterProScan soft would be used to annotate protein's GO function by protein sequence alignment method. Then, the proteins were classified by GO annotation based on three categories: biological process, molecular function and cellular component. A two-tailed Fisher's exact test was used to test the enrichment of the proteins which were differentially expressed for each category. The GO with a p value < 0.05 was considered significantly different.
Identified proteins domain functional description were annotated by InterProScan (http://www.ebi.ac.uk/interpro/) based on protein sequence alignment method. InterPro is a database that integrates diverse information about protein families, domains and functional sites. For each category proteins, the two-tailed Fisher's exact test was also employed to test the enrichment of the differentially expressed protein against all identified proteins. Protein domains with a p value < 0.05 were significantly different.
Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to annotate protein pathway.

KEGG Pathways mainly include Metabolism, Genetic Information Processing, Environmental
Information Processing, Cellular Processes, Rat Diseases and Drug development etc. KEGG database was used to identify enriched pathways by the two-tailed Fisher's exact test and p value was also < 0.05. These pathways were classified into hierarchical categories according to the KEGG website.
Subcellular localization was used to annotate elements in the cell mapped to proteins in eukaryotic tissue cells. It was depended on the membrane structures. The main subcellular locations of eukaryotic cells include extracellular, cytoplasmic, nuclear, mitochondrial, golgi, endoplasmic reticulum, peroxidase bodies, vacuoles, cytoskeleton, cytoplasm, nuclear matrix, and ribosomes etc.
Based on this, we used the software wolfpsort to predict the subcellular localization to annotate the subcellular localization of the submitted proteins and CELLO software to predict the subcellular structure of prokaryotes.
Further hierarchical clustering was based on different protein functional classifications, including GO, Domain and KEGG. All the categories were collated after enrichment along with their p values, and then those categories which were at least enriched in one of the clusters with p value < 0.05 were filtered. This filtered p value matrix was transformed by the function x = − log10 (p value). Finally, those x values were z-transformed for each functional category. The z scores were then clustered by one-way hierarchical clustering (Euclidean distance, average linkage clustering) in Genesis. Cluster membership were visualized by a heat map using the "heatmap.2" function from the "gplots" Rpackage.

Statistical analysis
The differences between two groups were analyzed with Student's t test. P < 0.05 was statistically

Results
Transcriptomic Profiling of the study population Human colon biopsies (n = 20) including 12 samples from constipation and 8 samples from nonconstipation (Table 1). All selected patients had similar gender and ages. Besides, there were no significant differences in disease duration or concomitant ileus between two groups.

Identification Of Biopsy Proteins
In this study, a total of 5,208 proteins were identified, of which 4,522 had quantitative information. All the differentially expressed proteins which were displayed fold change greater than 1.3 were Wolfpsort software was used to predict and classify the subcellular structures of differentially expressed proteins (Fig. 1). 33% of up-regulated proteins were annotated as belonging to cytoplasm, while more than 30% of down-regulated proteins were extracellular and nucleus.
The distribution of the quantified proteins in the GO secondary annotation was statistically analyzed. KEGG pathways include metabolism, genetic information processing, environmental information processing, cell processes, human diseases, drug development, etc. KEGG generated networks that demonstrated significant hubs of dysregulation in constipation patients in the work (Fig. 3). Red indicated strong enrichment and blue represented weak enrichment. For example, response to transforming growth factor beta in Q1 and phosphorylation in Q4 were strong enrichment in biological process of GO terms, and AGE-RAGE signaling pathway in diabetic complications in Q1 and metabolism of xenobiotics by cytochrome P450 in Q4 were strong enrichment in KEGG pathway ( Fig. 5,6).

Discussion
Chronic constipation is getting more attention worldwide with the rising of study in brain-gut axis and fecal microbiota transplantation 14-16 . Severe constipation can severely affect qualities of life, even making people having mental diseases [16][17][18] . Changes in gut were related to hormone and brain, and the environment of gut might be the causal, while little is known about the specific mechanisms and the differences in different biological pathways. It is the first time to report characters of constipation patients at the proteome level. These data represented in the study, not only confirm constipation patients might have potential mechanisms, but also provide new information regarding constipation patients, which will need further investigations.
In the study, a quantitative proteomic analysis of colon biopsy samples from constipation patients and non-constipation controls was carried out by TMT labeling, HPLC and LC-MS/MS to quantify the specific changes in biopsy proteome of constipation patients. There were 4522 proteins quantified, of which 65 proteins were differentially expressed by > 1.3-or < 0.77-fold. Bioinformatics analysis showed that the differentially expressed proteins were mostly involved in the cellular process, singleorganism process, metabolic process, biological regulation and response to stimulus, which mainly reflected abnormal functions of immunity. By contrast, constipation patients have a higher risk of inflammatory disorders and poorer clinical outcomes.

Conclusion
The TMT labeling and LC-MS/MS followed by MS analysis were applied to study the biopsy proteomic profiling alterations in constipation patients. This study proved that constipations had specific protein profiles and certain signaling pathways, related to inflammation reactions. The potential application proteins as biomarkers for distinguishing constipation individuals need further study.

Declarations
Ethics approval and consent to participate: Written informed consent was obtained from individual participants.
Consent for publication: Written informed consent for publication was obtained from all participants.
Availability of data and material: All relevant data are within the paper and its Supporting Information files.
Competing Interests: All authors declared no competing interests for this work.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. MS_identified_information.xlsx