The Proteome Map and PeptideAtlas of a widely cultivated tropical water sh Labeo rohita: A resource for the Aquaculture Community

26 With the global consumption of fish outpacing population growth, aquaculture sector is facing 27 challenges to address the rising demand of food and nutritional security. Integrative omics 28 research provides a strong platform to understand the basic biology and translate this 29 knowledge into sustainable solutions in tackling disease outbreak, increasing productivity thus 30 ensuring food security. To further understand the complex biology of host-pathogen response 31 and support the aquaculture effort, genome and proteome reference maps moving beyond 32 simple sequence information of cultivated fish species will accelerate research and translation 33 of quality products for food industries. Towards this end, we have performed an extensive 34 proteomics-based investigation of Labeo rohita, one of the economically important fish species 35 produced in world aquaculture. Deep proteomic profiling of 17 histologically normal tissues, 36 plasma and embryo provided mass-spectrometric evidence for 6015 high confident canonical 37 proteins at 1% false discovery rate. Tissue enriched expression of several biologically 38 important proteins was validated using targeted proteomics with high quantitative accuracy. 39 We characterised the global post translational modifications (PTMs) in terms of acetylation (n- 40 terminus and lysine), methylation (n-terminus, lysine and arginine) and phosphorylation 41 (serine, threonine and tyrosine) to present a comprehensive proteome resource. An interactive 42 web-based portal was developed to support the Labeo rohita PeptideAtlas, 43 a unique community resource for mass spectrometry-based 44 peptide/protein evidence in fish. This draft proteome map of Labeo rohita would advance basic 45 and applied research in aquaculture to meet the most critical challenge of providing food and 46 nutritional security to an increasing world population. 50


Abstract
With the global consumption of fish outpacing population growth, aquaculture sector is facing 27 challenges to address the rising demand of food and nutritional security. Integrative omics 28 research provides a strong platform to understand the basic biology and translate this 29 knowledge into sustainable solutions in tackling disease outbreak, increasing productivity thus 30 ensuring food security. To further understand the complex biology of host-pathogen response 31 and support the aquaculture effort, genome and proteome reference maps moving beyond 32 simple sequence information of cultivated fish species will accelerate research and translation 33 of quality products for food industries. Towards this end, we have performed an extensive 34 proteomics-based investigation of Labeo rohita, one of the economically important fish species 35 produced in world aquaculture. Deep proteomic profiling of 17 histologically normal tissues, 36 plasma and embryo provided mass-spectrometric evidence for 6015 high confident canonical 37 proteins at 1% false discovery rate. Tissue enriched expression of several biologically 38 important proteins was validated using targeted proteomics with high quantitative accuracy. 39 We characterised the global post translational modifications (PTMs) in terms of acetylation (n-40 terminus and lysine), methylation (n-terminus, lysine and arginine) and phosphorylation 41 (serine, threonine and tyrosine) to present a comprehensive proteome resource. An interactive Introduction 76 The average annual increase in global consumption of fish has outpaced population growth. Of 77 the global animal protein consumption, 20% is met by fish suggesting the importance of 78 fisheries in global food security and nutrition. India ranks second in global aquaculture 79 production and Indian major carps (IMCs) contribute 90% of its aquaculture economy 1  peroxiredoxins in human melanoma 6 . Proteomics can identify and explore sensitive and 86 specific markers for assessing the quality of fish or fishery related product 7 . The effect of 87 pesticide mixtures and temperature have also been explored in goldfish (Carassius auratus) 8 . 88 All these findings suggest the importance of proteomic characterization of fish would help to 89 address basic biology to ecological, environmental and food related issues. 90 Proteome reference maps for many organisms have been generated using high resolution mass 91 spectrometry such as for human and zebrafish 9-11 . Recent publication of rohu genome reported 92 a prediction of 26,400 protein coding genes 12 . However, proteomics studies in L. rohita (rohu) 93 are rare with most studies focusing on only a particular tissue in isolation 13,14 . Therefore, to 94 develop a draft map of rohu proteome; we performed an in-depth proteomic profiling of 17 95 histologically normal rohu organs, embryo and plasma using high-resolution high-mass 96 accuracy mass-spectrometry. We identified a total of 6015 canonical proteins with 1% FDR, 97 which is the first such extensive map for rohu proteome. Further, we developed a freely available L. rohita PeptideAtlas repository, for all observed 117 peptides and PTM peptides that would be helpful for researchers for designing hypothesis 118 driven or targeted experiments. Additionally, a web portal is prepared for the expression 119 analysis of the identified proteome across the organs (www.fishproteome.org). This study 120 could serve as a reference to advance search for specific gene products associated with muscle 121 growth and fertility improvement as well as investigate system level alteration in any diseased 122 or stressed conditions which can collectively determine the health of fish 25 . We believe this 123 extensive proteomic sequence and PTM information along with its genomic characterization 124 would serve as a community resource for the food and aquaculture industry to accelerate basic 125 research and applications in industrial aquaculture.

150
L. rohita organ based proteomic profiling 151 An extensive proteome catalogue of L. rohita was generated through in-depth proteomic 152 profiling of 19 different tissue samples including 17 histologically normal tissues, blood plasma 153 and four-day post fertilisation embryo (Fig. 1a). Proteins were extracted from the various 154 tissues using urea lysis buffer or Trizol method (see methods) and run on 1D SDS-PAGE.

155
Following the gel electrophoresis, fractionated proteins were sliced from the 1D gel into 2 mm 156 slices and then proteolytically cleaved into peptides using trypsin by in gel digestion followed 157 by mass spectrometry (Fig. 1b). Representative SDS-PAGE profile for all organs is shown in 158 Fig. S1a. The workflow for protein identification and analysis has been depicted in Fig. 1c. 159 We employed the pH shift method 26 for fifteen organs, using the Urea lysis buffer in order to 160 obtain higher coverage (i.e., 30% increment as compared to only pH 8 buffer) (Fig. S1b-c). 161 However, the frequency distribution remained same in terms of molecular weight (Fig. S1d).  (Table S1). We considered only canonical proteins for further analysis. The highest number of 167 proteins were identified from the brain while scales being the lowest (Table S1, Fig. 1d).

168
For functional annotation of identified proteins, ortholog analysis was performed using 169 eggNOG database 27 for all identified proteins across the organs, where the canonical proteins 170 were mapped against orthologs of wide range of cellular processes and metabolic functions.

171
Around 97% of the mapped orthologs belong to Actinopterygii, the class of ray finned fishes 172 and majority of them were linked to signal transduction mechanism (Table S1, Fig. S2).  Table S2). Proteins shared among all the samples were found to be mainly involved 181 in cell cycle regulation, cellular respiration (e.g., glycolysis), protein folding and structural 182 components of cell (e.g., cytoskeletal components) ( Fig. S3 and Table S2).   Interestingly, we could link differential expression of certain proteins present in both the sexes 221 to particular biological functions (Fig. 2c, Table S4). Overall, pathway analysis revealed that  (Table S5).

272
Representative spectra for phosphorylation, methylation and acetylation are shown in figure   273 S6a-c respectively. fin (13, 11), embryo (11, 10), plasma (5, 2) and scales (4, 3). Overall, Serine phosphorylation 292 was found to be predominant across the organs (Fig. 5c). Several phosphoproteins identified in 293 muscle tissue were earlier reported as markers for different quality parameters in fish or other 294 animals. For example, we could map seven phosphoproteins that are related to muscle 295 firmness 42 , five related to quality change due to post-mortem storage temperature 43 , six related 296 to color stability in fleshPMI 44 , four related to preslaughter stress effects 19 , and three for muscle 297 tenderness 45 . Many of these could be potential markers in case of rohu as well. The mapped 298 phosphoproteins in muscle were majorly myofibrillar and sarcoplasmic belonging to energy 299 metabolism, glucose metabolism and muscle contraction (Fig. 5d, Table S5). 300 Additionally, in female gonad, three serine phosphorylation sites have been identified in 301 Vitellogenin Ab (vtgab) which binds to vitellogenin receptor and further taken up by oocytes.  peptides for a particular target protein (Fig. 6a), was employed to validate the enriched 326 expression of a set of nineteen proteins among randomly selected three organs; eye, male gonad 327 and female gonad and their comparative intensities across these three organs was found at par 328 with the DDA data. Zona-pellucida sperm binding protein 2 like-isoform X2 has shown to have 329 enriched expression in female gonad as compared to other two organs (Fig. 6b). Similarly, 330 TUDOR domain containing-5 protein and cone cGMP-specific 2'-5' cyclic phosphodiesterase 331 subunit alpha showed maximum intensity in male gonad and eye, respectively ( Fig. 6c and d).

332
Comparative intensity of individual peptides for these proteins is shown in figure S7-S12.

333
Details of the remaining proteins measured by SRM based validation can be found in Table   334 S6.     Hard tissues such as scales, skin and fins were first ground to fine powder and processed 466 similarly as stated above. Embryo was processed using Trizol method 79 and plasma (female) 467 was processed for in-gel digestion directly.

512
The chimeric spectra were accessed by reanalysing the iProphet files using reSpect 513 algorithm 87 . In brief, reSpect search was performed on iProphet files by increasing the 514 precursor mass tolerance to 3.0 Da. TPP analysis was performed as mentioned earlier and the 515 process of reSpect and TPP analysis was repeated once. A minimum iProphet probability ≥ In order to validate the tissue enriched expression of proteins using SRM assay (Fig. 6a), a total 540 of 19 proteins from three organs-female gonad, male gonad and eye were selected. Skyline 541 software was used for method optimization and final data analysis. Initially, all the proteins 542 showing unique expression in DDA data to either of these three tissues were taken for SRM 543 validation of which only 19 proteins showed consistency in results. For generation of the 544 transition list, a background proteome consisting of NCBI rohu database was used (Table S6). 545 Initially Following optimization of parameters for SRM, a scheduled method containing 456 546 transitions corresponding to 86 peptides for the 19 selected proteins was used for the final run 547 (Table S6).

548
An EASY-nLC 1200 LC set-up coupled with TSQ Altis was used for acquiring SRM data. The  (Table S2)       Overall experimental work ow and description of obtained proteomic data a. Diagrammatic representation of total organs included for proteomic analysis, b. Schematic work ow of sample preparation for mass spectrometric protein identi cation using in-gel digestion protocol, c. Work ow for protein identi cation in which proteins were identi ed using rohu Uniport and NCBI databases with two different pipelines (Proteome Discoverer and Tran Proteomic Pipeline), Data obtained from TPP was taken for PTMprophet analysis and building peptide Atlas. Con dent proteins were taken forward for further analysis, d. Bar plot depicting unique and total proteins identi ed in each tissue sample.   Post-translational modi cation pro le across the organs a-c. Frequency of acetylation, methylation and phosphorylation across the organs respectively 640 showing the number of modi ed peptides; acetylation sites include n-Terminal acetylation and lysine acetylation, methylation at lysine and arginine and phosphorylation sites include those on serine, threonine and tyrosine, d. Literature curated potential phosphoprotein markers for esh quality traits identi ed in our proteomics data (muscle tissue) Figure 5 Example of a protein search and peptide search in Rohu peptide Altas a. Out of several collapsible sections for protein, three are shown representing the overview of protein information, observed peptides highlighted in red font, additional information for each observed peptide respectively, b. Under peptide view, two sections for one of the observed peptides of the same protein are shown representing general information about peptide and respective spectra. Example of a protein search and peptide search in Rohu peptide Altas a. Out of several collapsible sections for protein, three are shown representing the overview of protein information, observed peptides highlighted in red font, additional information for each observed peptide respectively, b. Under peptide view, two sections for one of the observed peptides of the same protein are shown representing general information about peptide and respective spectra.