Comprehensive proﬁling of urinary peptides in cow reveals physiology speciﬁc signatures and several bioactive properties

Peptidomics allows the identiﬁcation of thousands of peptides that are derived from proteins. Urinary peptidomics has revolutionized the ﬁeld of diagnostics as the sample represents complete systemic changes of the body and is collected non-invasively. We proﬁled the peptides in urine collected from different physiological states of Sahiwal cows namely heifer, pregnancy, and lactation. Endogenous peptides were extracted from 30 individual cows belonging to three groups, each group comprising of ten animals (biological replicates n = 10). Nano Liquid chromatography Mass spectrometry (nLC-MS/MS) experiments revealed 5239, 4774, and 5466 peptides in the heifer, pregnant, and lactating animals, respectively. The diversity in endogenous peptides in urine sets baseline information substantiating various bioactivities (anti-inﬂammatory, antimicrobial, antihypertensive, and anticancerous) associated with cow urine. Several proteases have been traced back in the body which was found responsible for the physiology speciﬁc peptide signature of urine. The in silico study also highlights the enrichment of progenitor proteins on the speciﬁc chromosome and their relative expression in context to speciﬁc physiology. The urinary peptides, precursor proteins, and proteases identiﬁed in the study thus set a solid foundation for future research in biomarker discovery and a better understanding of the pathophysiology of the body. the The reports a comprehensive proﬁle of endogenous peptides in bovine urine across various states and discusses their biological activities, chromosomal mapping, and molecular features. To date, this is the comprehensive bovine urinary peptidome dataset. the cleavages and the associated proteases for heifer, lactation, and pregnant groups.


Introduction
Excretory biological fluids such as urine, saliva, milk, tear, mucus, and sweat are significantly important for the maintenance of homeostasis in normal physiological conditions. Urine being a glomerular filtrate of blood is thus capable of summarising the events that occur in the body as a result of changing physiology or pathological conditions. The systemic changes are well reflected by qualitative and quantitative alterations in the urine composition which are very useful for prognostic and diagnostic purposes. No wonder, urine has been considered an excellent sample for the discovery of biomarkers associated with general health and diseases. Endogenous peptides and proteins secreted in urine have been proven as hallmarks of various pathophysiological changes. The ability to identify and quantitate these peptides has accelerated the drug development process, and that is the reason, the pharmaceutical companies are getting extensively involved in the process of the identification and validation of proteinaceous biological markers of diseases. Urine is a far better option over other biological fluids as it can be obtained in large volume non-invasively. Another reason for urine as an attractive option for proteomics and peptidomics is its stability. The process of degradation of proteins into resultant peptides is completed by the time of sample collection, whereas protease activation in fluids like the serum, plasma, and saliva remains continued even after sample collection that affects the actual result 1,2. Besides the metabolites, minerals, and salts in the glomerular filtrate, urine is composed of peptides and proteins originating from tubular secretion, secreted exosomes, and epithelial cells shed from the kidney and urinary tract 3. Urine carries more local information as indicated from studies that show that 70% of proteome comprises secretions down from kidney to urinary tracts and the remaining 30% originates from plasma 4. Under the Human Kidney and Urine Proteome Project, comprehensive studies had been carried out to find novel biomarkers associated with kidney diseases, and to unravel the proteome of urine in normal and diseased conditions 5-8. Extensive investigation of human urinary endogenous peptides has been carried out in the context of various diseases stating their strong clinical relevance 9-11. Urine and saliva are the two important biological samples that can be utilized for this purpose and, in the case of humans and mice, abundant literature is available. Despite its economic significance, studies in bovine are still in the early phase. Although reports related to protein-based biomarkers in saliva and urine are present 12-14, the peptidome based studies are very scanty in bovine and most of the studies belong to milk. Milk has been explored for its peptide and protein profiling, identifying around 8559 unique peptides and 6210 proteins, respectively 15,16. Lack of studies in bovine urinary peptidome specifically makes it difficult for researchers to explore and investigate its primal nature and its application in clinical and biomarker discovery related studies. A comprehensive urinary peptidome database encompassing diverse peptides tracing back to systemic and local origin which are processed by a variety of endogenous proteases will tell a defined story of animal physiology and can be referred to in the future for studies involving bioactive peptides, and biomarker discovery for clinical diagnosis and prognosis. Contemporary studies show more inclination towards the discovery of biomarkers associated with a particular condition or disease. But the vast data generated by MS/MS analysis of urine can track the changes in proteome/peptidome not only in a condition-specific manner but also in establishing physiology specific bio-molecular signatures present in it. In that direction, the foremost thrust in the field of clinical proteomics is the profiling of such endogenous peptides. The current study reveals urinary peptidome in the Sahiwal breed of cows across different physiological stages and their resourcefulness in drug discovery.

Endogenous peptidome profile
The endogenous peptides were purified using a solid-phase ethyl acetate extraction method ( Figure 1A). The total collection of physiologically distinct (heifer, pregnant, and lactation) urine samples resulted in 2,15,079 spectra ( Figure 1B). The pooled samples (n = 10) in each physiological state resulted in the identification of 5239, 4774, and 5466 endogenous peptides in the heifer, pregnant, and lactation, respectively. The dataset was analyzed using Trans Proteomics Pipeline (TPP) with three independent search engines. The complete results summary has been provided in Table 1. The protein prophet and iProphet value of 1 and 0.9999 respectively, along with less than 1% FDR was used as a cutoff to select the highly confident peptides (Supplementary Table 1) ( Figures 1B and C). Initial screening of mapped peptides showed the highly diversified nature of endogenous peptidome ( Figure C).

Molecular characteristics of urinary peptides
The frequency distribution of molecular weight, peptide length, and amino acid composition of peptides belonging to three different groups have been shown in (Figure 2). The peptide sequences from different groups were found to have a molecular size of less than 10kDa, (Figure 2A). In heifer and lactating animal's urine, low molecular weight peptides ranging from 1.4-1.5 kDa were more prevalent. In contrast, in pregnant animal's urine, peptides of relatively large size in the range of 1.8, 2.2, and 2.9 kDa were more prevalent (Figure 2A). The peptides length distribution comparison showed equal frequency in three conditions ( Figure 2B). However, amino acid composition indicated that alanine, glycine, leucine, proline, and serine (in decreasing order of abundance) were the most frequently occurring amino acids ( Figure 2C) which reflected a similar pattern across the three physiological states under study. In summary, the results suggest that endogenous urine peptidome is highly diversified in nature but contains comparable molecular characteristics. It guided us to identify the cause and origin of underlying diversity in the peptidome.

Bioactivity prediction associated with urinary peptidome
Urine acts as a natural resource of the innate immune defense system. Therefore, we sought to identify the biological activity associated with peptides using in-silico and in-vitro experiments. We analyzed the band pattern of peptide profile in Tricine-gel which showed similar results across different groups ( Figure 3A). The observation agreed with the findings of molecular characteristics ( Figure 2) that the urinary peptides are of low molecular size less than 10kDa. In silico experiments were conducted to classify the bioactive peptides into antimicrobial, anti-inflammatory, antihypertensive, and anticancerous groups. The notion was to predict the mass spectrometer-based profiled cow urinary peptides in either of the classes and further narrowing down to the peptides which exhibited a wide spectrum of activities. Excitingly, we identified 15 peptides displaying in-silico activity of all types of processes ( Figure 3B). Anticancer and antihypertensive sets shared a large number of sequences constituting 6.3% with 211 peptides. The antimicrobial (n = 551) and anti-inflammatory (n = 607) peptides shared 36 sequences ( Figure 3B). Sequence analysis of all the peptides showed that glycine amino acid was a common and comparatively most abundant residue in all the classified peptides ( Figure 3C, D, E, and F). This bears an important message in the designing and synthesis of broad-spectrum bioactive peptides that can perform multiple functions. A total of 551 high scoring antimicrobial sequences were used to determine the consensus motif of amino acids. The average hydrophobicity of the predicted sequences was -0.13 with an average charge of +1.8, indicating that sequences are hydrophilic and cationic. The sequence logo showed the dominant presence of positively charged residues like Arg and Lys, whereas Gly and Leu were dominant hydrophobic residues ( Figure 3C). A total of 607 sequences were predicted to possess anti-inflammatory activity. A positional conservation study showed that leucine residue was relatively dominant at the first 10th position from the N-terminal end in anti-inflammatory peptides. Positional conservation study at N-terminal of anti-inflammatory peptides showed Leu, Tyr, Ser, Arg, and Glu to be highly conserved residues. While its position may vary, Leu residue seems to be central to the anti-inflammatory activity ( Figure  3D). We identified 1767 sequences predicted with antihypertensive activity with Pro as dominant residue at all positions ( Figure  3E). In comparison, 724 unique sequences were predicted to possess the activity mentioned above ( Figure 3F) (Supplementary Table 2). For validation of the in-silico finding, we tested the antimicrobial activity of isolated peptides against two pathogens, E. coli and S. aureus ( Figure 3G and H). The pure peptide extract showed a significant zone of inhibition in the disk diffusion method with an inhibition zone of mean: 1.22+/-0.11 (SD) cm; and SEM: 0.03 cm on E.coli and mean inhibition zone of 1.22 cm; SD: 0.10 cm; SEM: 0.03 cm on S. aureus. The results suggest that urinary peptides possess strong antimicrobial activity.

Bovine urine degradome
Manual curation of the 22 selected proteases resulted in the discovery of an average of 7215 protease sites (Supplementary Table 3). The urinary peptide list retrieved was investigated for four residues either from N or C terminal and peptides were sorted based on possible protease that might be involved in its release from the precursor sequence. We found physiology driven variation in the number of sequences derived from proteases. We further validated our findings using the web-based tool: Proteasix. It determines confidence thresholds of predicted proteolysis by using MEROPS specificity weight matrices for experimentally validated cleavages. MEROPS is a database that provides information about protease substrate sequence specificity in terms of the specificity weight matrix. Based on experimentally confirmed cleavages, the matrix shows the frequency of amino acids at every position of the site. The output data was refined by removing the cleavages that were having specificity below 80%. Protease cleavages were compared; although the number of cleavages varied, we found that 54 proteases from the total 62 were common among the three groups ( Figure 4A). We determined the common protease active in all three physiological conditions and found that 85.7% (54 proteases) are shared. No unique protease enzyme was found in pregnant; however, two unique proteases were reported in Heifer and Lactation ( Figure 4B). Pearson correlation-based group-wise comparison (Heifer, Pregnant, and Lactation) is 0.99 among each other, describing highly correlated data. The heat map represents the total number of target peptides identified in the data to specific protease ( Figure 4C) and similarity Pearson comparison ( Figure 4D). Distribution of isoforms of MMP enzymes identified in protein mapping and respective target determined in peptidome ( Figure 4E). The distribution of isoforms of KLK enzymes identified in protein mapping and respective target determined in peptidome ( Figure 4F). Protease wise comparison and measure of Pearson correlation ( Figure  4D) identified eight different protease enzyme (CASP2, CASP8, FURIN, LGMN, MMP10, MMP17, PCSK1, and PRSS3) that were in disagreement with physiological state and uncorrelated with one another ( Figure 4G). The profile plot and heatmap of all three stages representing the number of protease targets revealed that the pregnant stage contained fewer targets than others ( Figure 4F and A).

Chromosomal mapping of stage-specific bovine endogenous urinary peptidome
Total spectra and protein information identified from the search engine using the UniProt database were represented as the interconnecting bars in the circle (Figure 5 A). Notable results are the identification of the denovo peptides mapped to all the 29 + X chromosome. Interesting facts uncovered was the uneven distribution of the protein expression from the chromosomes in a physiological stage-specific manner. The expression of proteins from the chromosomes was drastically inconsistent at different physiological states (Heifer, Lactation, and Pregnant). Bubble plot-based heat map analysis showed the total protein counts represented from respective individual chromosomes ( Figure 5 A, left side) and percentage contribution of the proteins in terms of enrichment from the total number of proteins to the coded in the bovine genome ( Figure 5 A, Right side). Chromosomes 3, 5, 7, 11, 18, and 19 were responsible for hosting the maximum number of endogenous peptides on the genome, whereas chromosomes 12, 20, 24, 27, and 28 gave rise to a minimum number of peptides that are identifiable in the urine. The unbiased analysis of enrichment percentage for all the chromosomes mapped sequences showed the high enrichment of respective peptides from chromosome 14, 20, 21, 22, 25, and 28 in the stages of heifer and lactation. Interestingly, chromosome 12 was rich in the release of peptides in pregnant animal urine samples, whereas remaining all other chromosomes contributed equally. The same results were obtained from the distance-based correlation analysis among chromosomes. We found the distinct expression values of chromosome 8, 12, and 15 in comparison to others ( Figure 5B). Our finding raises a question: what are the reasons responsible for the high expression from certain chromosomes in heifer and lactation stages while the low contribution of coded peptides and proteins from the same chromosomes in the pregnant state? Hitherto, what factors decide the differential expression of chromosomes in different physiological states is not known. We suggest that physiology specific expression of proteins is probably the reason which dictates the profile of peptides in urine. Nonetheless, further studies on genome-wide proteome analysis are required to answer these questions. The heatmap analysis for 483 common parental proteins mapped in all three stages with significant St Peter's values showed the clustering pattern among the proteins as defined by the Kmean clusters ( Figure 5C). Keeping the heifer stage as the control, we determined the fold change difference among the stages. The results showed the identification of five proteins which are more than 9 fold up-regulated in pregnant and lactation stages, the proteins are STAB2 ( It shows the importance of these five proteins in pregnancy and lactation.

3/19
Tracing back Peptidome to physiology specific functional annotation We compared all of the protein concentrations identified through StPeter among the physiological states ( Figure 6A). The analysis identified the 525, 465, 458 common proteins between heifer versus lactation, heifer versus pregnant, and pregnant versus lactation, respectively. The total numbers of average 1500 proteins were identified uniquely in all three conditions. The molecular function analysis showed the identification of parental proteins contributing to the multiple cell replication process ( Figure 6B) with respective p values reported (Supplementary Table 4). The processes are RNA polymerase DNA binding, ATP binding, promoter-specific chromatin binding, minus-end-directed, transcription coactivators. Even though the highest number of proteins corresponds to the heifer, but we found the lactation state showed the high enrichment of the ATP binding process (GO: 0005524) with p-values < 0.001 and 15.80% of involved proteins. However, RNA polymerase activities (GO: 0000978; GO: 0001228; GO: 0000977) in heifer was significant with p-values < 0.001 and proteins involvement of 4.79%. All the pregnancy-specific molecular function ontologies have non-significant p-values except the extracellular matrix structural constituent (GO: 0005201). It is also significant in lactation with p-values<0.001. The cellular component analysis showed the identification of the different class of organelles and cellular parts in the ontological information with respective p-values ( Figure 6C). Again we found a significant p-value <0.001 in the lactation stage for collagen trimers (GO: 0005581). Altogether these results support our finding for the crucial importance of collagen proteins specific endogenous peptides secretion during pregnancy and lactation in which extensive remodeling of tissue and organs takes place 14. COL8A1 gene was found upregulated during tissue remodeling in pregnant myometrium (human) 17. During mammary cell growth in bovine, COL8A1 is involved in epithelial cell proliferation which was found to be upregulated during lactation 18. The gene KIAA1522 was differentially methylated in the mammary gland of dry goat as compared to those in lactation, suggesting its possible stage-specific role 19. Furthermore, the comparison of all three stages for the biological process identified the five most enriched ontologies ( Figure 7A). In correspondence to CC and MF process, we found the identification of extracellular matrix organization (GO: 0030198) and regulation of RNA transcription process (GO: 0000122; GO: 0045944). However, homophilic cell adhesion (GO: 0007156) and T cell differentiation (GO: 0033077) is the unique ontologies determined. Only pregnant specific ontology in homophilic cell adhesion was determined to be significant (p<0.001). Next, we searched our list of proteins from different stages with the induction process ( Figure 7B). None of the processes was found to be shared in all three stages; however, proteins associated with specific processes were explicitly relevant to the physiological condition. We identified the GLUT8 peptides in pregnant animal urine, which corresponds to upregulated during the pregnancy induction process. GLUT8 mRNA expression in the bovine mammary gland was increased 10-fold (P<0.001) during late pregnancy and early lactation 20. We next sought to retrieve information from the urine proteome by comparing various database based developmental stage and tissue specificity ( Figure 7C and D). We found the identification of the relevant proteins in the urine specific to the stages previously reported in the literature. The results strongly suggest that the diversity of endogenous peptides is indicative of the physiological state.

Discussion
The composition of urine is an imperative indicator of the physiology of the organism and hence, can be used as an excellent non-invasively collected biofluid for diagnostic purposes. Endogenous urinary peptides have an obvious edge over full-length proteins as the formers get easy access to urine and at the same time can insinuate toward a particular protein that is disturbed within the body. The study reports a comprehensive profile of endogenous peptides in bovine urine across various physiological states and discusses their biological activities, chromosomal mapping, and molecular features. To date, this is the largest comprehensive bovine urinary peptidome dataset.
The urinary bladder serves as a temporary reservoir of urine. The protease activity is an essential requirement for continuous turnover of proteins in organs like in urinary bladder, uterus or mammary gland wherein muscular elasticity is needed for continuous expansion and contraction. These proteases function in an interactive manner where multiple target proteins interact with multiple proteases of various families and classes creating a protease web 21. Many of these proteases can be traced back to the urinary bladder which is highly stretchable. However, it is not certain whether these peptides get access to urine through cellular exudates or systemic circulation. A large number of peptides are breakdown products of large proteins being involved in urine storage and voiding viz collagen.
The molecular size distribution of the urinary peptides follows a bell-shaped curve which indicates that the majority of the peptides present in urine are between 700 to 4799 Da. For peptides to get access to urine through glomerular filtrate, the permissible size cutoff is a critical determinant on the length of urinary peptides found in the urine. Of course, reabsorption in the urinary tract may affect the size distribution 22. In this case, it seems that most of the peptides are released into the urine because of its excess in the circulation over and above the tubular saturation limit. The observation of several peptides in urine which cannot be attributed to local organs like UB, ureter, urethra, local glands, etc is suggestive of its origin being glomerular filtrate from the blood. For example beta casein isoform and Complement C4-like isoform X1, proteins are mostly found in mammary gland and blood cells respectively 23. The prevalence of specific peptide sequences over others suggests the crucial

4/19
role executed by their precursor proteins in a specific physiological context.
To translate the research, we characterize protease substrate cleavages as net outcomes of complex proteolytic activities in normal vis-à-vis diseased physiology to foster the characterization of numerous peptide-binding receptors (PBRs) as potential drug targets. The result suggested that endogenous peptides possess a variety of bioactive functions. Prediction platforms were used to gain a comprehensive insight into the peptide sequence. We found a pattern of amino acid residues in the sequences that might be contributing to the different types of activities associated with the peptides. Many web-based prediction platforms use the Amino Acid Composition (AAC) based algorithm and literature supported the bioactivity of peptides. While it might not be fully accurate, searching for common residues and motifs on peptide may help us to explain the compositional biases of peptides for certain amino acid residues and various beneficial effects attributed to the associated bioactivity.
The cow urine has been granted US Patents for its medicinal properties, particularly as a bio enhancer along with antibiotics, antifungal and anticancer drugs (6896907, 6410059). Jain et al. studied the effect of cow urine therapy on various types of cancers in the Mandsaur area In India wherein they reported that the severity of symptoms (pain, inflammation, burning sensation, difficulty in swallowing, and irritation) subsided significantly 24. The study identifies a large number (specific number) of peptides having sequence motifs unique to anticancerous properties. Pro residue can be considered a common residue on the peptides from two categories namely anticancer and antihypertensive peptides. Scanty information is available on anticancer and antihypertensive peptides, however, a report determined the possible role of Ang II in tumor progression, as antihypertensive peptides target angiotensin-converting enzyme, they might be potent anticancer agent 25,26. Many antimicrobial peptides exhibit anti-inflammatory features, the reason why we get a relatively large number of shared sequences between these two sets 27,28. Wei et al pointed out that many AMPs are capable of binding to LPS and this might be a possible reason for how AMPs exhibit anti-inflammatory activity by inhibiting LPS induced NO release, a pro-inflammatory mediator 29. Sequence analysis showed that Gly was a common and comparatively most abundant residue in all classified peptides. Five amino acids namely Gly, Ser, Ala, Leu, and Pro in decreasing order of abundance constitute the major percentage (roughly about 50%) of all residues present in urinary peptides while in an estimate drawn from yeast proteome ( 6225 known and predicted proteins, it has been reported that four amino acids leucine, serine, lysine, and glutamic acid are the most abundant amino acids, totaling 32 percent of all the amino acid residues in a typical protein 30. Thus, it can be understood that the urinary peptides are more selective than being random. Shoombuatong et al reported that Gly is the frequently occurring residue in antibiofilm peptides (ABPs), anticancer peptides (ACPs), antifungal peptides (AFPs), antiparasitic peptides (APPs), and antiviral peptides (AVPs) at 10.98%, 10.88%, 10.79%, 10.77%, and 11.82% respectively 31.
ACE inhibitor peptides are reported to possess aromatic amino acids such as tryptophan, tyrosine, or proline at C-terminal 32,33. Consistent with the predicted list of peptides, an adjacent proline residue to C-terminal proline residue also increases the ACE inhibitory activity of peptides 34. Anti-inflammatory peptides (AIP) exert their effects by a variety of mechanisms viz. Inhibition of JAK-STAT and NF-kB pathways, inducing the secretion of IL-4, inhibition of LPS induced cytokines 35-38. Studies have reported that hydrophobic residues like phenylalanine and leucine have a major influence on the anti-inflammatory effect of peptide 39,40. Manavalan et al reported that average composition Arg, Leu, and Lys, were dominant in AIP 41. Also, Leu alone as a residue has been shown to exert anti-inflammatory activity by reducing the LPS induced NO, a pro-inflammatory mediator in RAW 264.7 macrophage. In our study, we did found Leu to be a relatively abundant amino acid followed by Gly and Ser residues.
Studies have shown that ACPs are mostly dominated by Gly, Lys, Cys, Phe, Ile, and Trp as compared to non-ACPs 42,43. On contrary, in our case, sequence logo creation showed that anticancer peptide sequences are dominated by Pro, Gly, Ser, Leu, Arg residues with Pro dominating at almost every position of the sequence. A study reported that the presence of Pro residue enhances the toxicity of peptides against nucleated cancer cells while simultaneously protecting RBCs from lysis 44. Just like AMP, Anti Cancer Peptide (ACP) also contains cationic residues; the selective killing is suggested to be the electrostatic interaction between cationic residues and the anionic membrane of cancer cells 45. In our finding, Arg residue was the only abundant cationic residue followed by histidine. As already mentioned, collagen-derived peptides constitute a major portion of cow urinary peptides, therefore, finding Pro residue in significant abundance in the case of AHTPs and ACPs is not surprising. The wide spectrum of bioactive properties of cow urine may be associated with the small size endogenous peptide sequences.
Anti-Microbial Peptides are usually short (10 to 50 amino acids) with an overall positive charge (predominantly +2 to +9) and are present in every form of life 46. AMPs/ ABPSs exploit the difference between the composition of the prokaryotic and eukaryotic membranes. Lipids in the outer leaflet of the animal cell contain no net charge, while prokaryotic membranes are rich in anionic phospholipids, hence, it is selectively targeted by AMPs 47.
Peptide sequences were also evaluated based on their physicochemical properties as a fine balance between charge and hydrophobicity drive the activity of the peptide sequence. In agreement with our findings, one study finding showed that most of the position of AMPs was dominated by Arg, Lys, Leu, and Gly 48. The balance between a positive charge and hydrophobic residue drives the activity of AMP, positively charged residues in AMP undergo electrostatic interaction with the negatively charged prokaryotic membrane, while hydrophobic residues help the AMP penetration and disruption of the bacterial membrane

5/19
49,50. Chang et al reported that a higher percentage of glycine and lysine was present in the core of AMP and the critical region of AMPs, glycine was the most abundant residue 51.
The presence of collagen peptides in the urine is probably due to excessive bladder activity. Several studies reported its role in urinary bladder compliance, proliferation, and bladder filling 52,53. A study shows that the ratio of type III to type I collagen determines changes in compliance in both developing fetus and mature bovine UB. Significant reduction in volumetric densities of type I and III collagen was observed with the aging of the urinary bladder in Wistar rats. The urinary bladder and other parts of the urinary system always remain in dynamic states of distension and relaxation to process and accommodate the varying volumes of urine. To ensure this, urinary epithelium and stromal cells undergo extensive extracellular matrix (ECM) degradation and remodeling. ECM epitopes, a direct resultant of MMP mediated proteolysis, are involved in the stretch-induced proliferation of bladder smooth muscle cells through ERK1/2 signaling activation 54. Collagen being a connective tissue is the major target in the ECM degradation and remodeling process and hence shows its signature in urine in the form of various small-sized EPs. Reinforced by other reports, our finding shows that collagen-derived peptides are abundant in urine 9,55,56 but the individual sequences of the peptides were different in different physiological states. One study found a significant correlation of the relative abundance between urine and plasma samples and hypothesized that collagen-derived peptides, unlike any other peptide in plasma, by an unknown mechanism escapes tubular reabsorption 56. In urine, the peptide degradation is least in comparison to other biological fluid like blood serum or plasma 57. Thus, making it a suitable biological fluid for proteome based study.
The role of proteases is important for the complete turnover of proteins in various physiological and diseases condition. Besides MMPs, cathepsin proteases were also predicted with quite a significant number of cleavages and we also found their signatures in MS/MS data. Cathepsins are lysosomal enzymes which include serine, cysteine, and aspartic proteases, just like MMPs, these are also involved in the remodeling of ECM 63,64. Interestingly, in the pregnancy stage, the resultant peptides sequences from protease cleavage were significantly less as compared to the other physiological states. Similar observation was made in the study in mice where the activity of certain proteases such as MMP2 and MMP9 is suppressed during pregnancy which increases postpartum and reverts to normal 65

Conclusions
We presented a simple method for the discovery of thousands of endogenous peptides in cow urine that contribute to various bioactivities associated with urine. The data presented here represents the identification of 5000 natural peptides in all three physiological conditions laying the foundations for detailed studies. The molecular weight, peptide length, and amino acid distribution of endogenous peptides follow a similar pattern in all three stages. We provided evidence for the peptide-mediated antimicrobial activity against E. coli and S. aureus. This study also analyzed the complex network of proteases active during specific physiological states and the target proteins as precursors of the urinary peptides. The knowledge about physiologyspecific proteins and proteases may generate further interest in the field of biomarker discovery for pregnancy diagnosis or in understanding the pathophysiology during development and lactation. The urinary peptides also represent degradome and explain the formation and release of peptides from specific proteases. Lastly, the tissue-specific developmental gene ontological classification derived from the protease-targeted proteome highlighted the stages of specific proteins being active in specific physiology. The data opens future avenues to set a benchmark for urine-based biomarkers unaffected by physiology and also helps to understand various functional activities associated with cow urine. The present study has searched for the peptides based on primary sequence molecular weight only. However, it will be interesting to observe and search for post-translationally modified peptides including glycosylated, phosphorylated, and others.

Sample collection
The urine samples were collected from 30 healthy female Sahiwal cattle [Bos indicus, belonging to three different physiological states namely heifer (n = 10), pregnancy (n = 10), and lactating (n = 10) in the presence of a veterinary doctor from Livestock Research Centre located at National Dairy Research Institute (NDRI) Karnal. All the animals included in the study were clinically healthy and divided into three groups (Heifer (age between 17th-18th months), pregnant (40th-60th days of pregnancy), and lactating (80th-100th days of lactation). All procedures were approved by the Institutional Animal Ethics Committee (IAEC) ICAR-NDRI, Karnal, India.
Within each group, a sample was created by pooling samples of ten animals ( biological replicates n = 10) and processed separately with four technical replicates. However, to understand the animal to animal biological variations, we performed a separate study on seven individual animals from each category. The procedure for the samples collection and processing is the same for all the groups as described here briefly. Approx. 500 mL of fixed time morning voids urine samples were collected by massaging the perineum of the animal manually. . The collected samples were immediately transferred to the lab and analyzed for any debris, dung, dust, or hair to rule out any contaminants in the samples. Initially, the samples were filtered by muslin cloth followed by centrifuging at 7000 rpm for 20 minutes to allow settlement of any cell debris and particulate matter. The microscopic examination was performed for individual samples, before and after the centrifugation, to observe the presence of RBCs, WBCs, other cells, and debris. The purified clean urine was further used for peptide extraction and purification.

Peptide Extraction
The clean urine samples were passed through ultra-filtration assembly (Thermo easyload Masterflex, model 7518-00, USA) with a 10 kDa molecular weight cut-off filter (Pall MinimateTM TFF Capsule), to separate the endogenous peptides mixture. The pH of the obtained filtrate was adjusted to less than 3 using Trifluoroacetic acid (TFA) for further treatment. The peptides mixture (PM) was subjected to manually prepared C-18 beads based Solid Phase Extraction columns (SPE). Briefly, the column was prepared using the C-18 reversed-phase silica gel (Sigma, Switzerland Cat. No. 60757-50G). A slurry of 20 grams of silica gel was prepared in methanol. The packing of the column was done by continuously stirring the slurry and then slowly draining it into the column followed by several washes with methanol.
The packed column was conditioned using 90% methanol followed by equilibration with 10 column volumes of 0.1% TFA in water. After equilibration, PM was loaded with a flow rate of 0.5 ml/min followed by desalting using 5% methanol with 0.1% TFA. The desalted peptides were eluted in 60% acetonitrile (ACN) with 0.1% TFA. Processing of around 500 mL of urine yielded approx. 30-50 mL of eluate with dark brown appearance. For the removal of the dark brown substances (possible contaminating metabolites in urine) the ethyl acetate-based extraction was performed. The eluates were subjected to the double volume of ethyl acetate followed with end to end rotation for 5 min and allowed to settle for 10 min to differentiate into two layers. The upper organic layer was stored appropriately for metabolome profiling and the lower aqueous phase was aliquoted in 2 mL microcentrifuge tubes and dried by speed vac (Thermo savant ISS110 SpeedVac concentrator, ISS110-230, USA).
The dried samples were stored at -80oC. Prior to mass spectrometer run the routine sample cleanup was performed by Pierce® C-18 Spin Columns as per the manufacturer's protocol. Briefly, the quantified samples were reconstituted in 20% ACN with 2% TFA (sample buffer, 3µl for every 1 µl of the sample). After loading and washing of the sample, peptides were eluted in 70% ACN with 0.1% TFA. The eluted samples were dried and reconstituted in 0.1% TFA before MS run.

Electrospray ionization tandem mass spectrometry LC-MS/MS analysis
The reconstituted peptides were used for shotgun proteomics experiments for the identification of the endogenous peptides. The peptides were separated using micro-LC (Dionex, Thermo UltiMate 3000 HPLC System, USA) through analytical column (Supelco, Ascentis® Express C18, 25cm × 4.6mm, 2.7 um) coupled with ESI source (Bruker Daltonics, Germany) spray in Maxis-HD qTOF (Bruker, Germany) mass spectrometer. The acquisition parameters were adapted from our previous reports with slight modifications 66,67. The elution was performed with a flow rate of 150 uL/min in a continuous gradient of 5-75% acetonitrile over 135 min. In the solvent system; Solvent A was 100% water with 0.1% formic acid, and solvent B was 100% acetonitrile with 0.1% formic acid. Data were acquired in the data-dependent mode in mass spectrometer operated in automatically switching between MS and MS/MS acquisition. The precursor ion MS spectra scan range of 200-2200 (m/z) was used in the Q-TOF with resolution R = 75, 000. The six most abundant precursor ions were searched for detection of different masses during acquisition and selected for fragmentation using collision-induced dissociation (CID) with a fixed cycle time of 3 sec along with 2 min of release for exclusion filter (otof processing software, Bruker Daltonics).

Data Processing
The .d raw data files were subjected to TPP pipeline for the identification of endogenous peptides. For analysis, the otof generated raw (.d) files were converted to mzML format using MSconvert GUI using the default parameters. The converted mzML files were searched for MS/MS spectra using the Trans-Proteomic Pipeline version 5.1.0 released on 2017-11-03 on in-house combined UniProt Bos taurus (Cow), Bubalus bubalis (Buffalo), common contaminant sequences, and an equal number of decoy sequences database. The detailed protocol was reported in Suhail et al., 2019 68. Briefly, for the analysis, the peptide assignments were performed using multiple search engines using X! Tandem (with the k-score plug-in), and Comet. For both the search engine the search parameters included un-digested peptides and the remaining parameters were kept as default. The minimum peptide length parameter was set to 6 amino acid residues. Further Peptide Prophet and Protein Prophet algorithms were used in the pipeline to compute the probabilities score for both individually searched peptides and the respective proteins. The accurate mass model in Peptide Prophet was used for high confidence peptide identifications to boost the probability of peptide identification. Another protein validation step was executed using both Peptide Prophet and Protein Prophet Scores, where the protein was authenticated if it contained a minimum of two top-ranked peptides with each peptide probability score above 95%. All the search engine results were merged and validated using iProphet. This method takes the input of Peptide Prophet spectrum-level results from multiple LC-MS/MS runs and then computes a new probability at the level of a unique peptide sequence (or protein sequence). This framework allows for the combination of results from multiple search tools and takes into account other supporting factors, including the number of sibling experiments identifying the same peptide ions, the number of replicate ion identifications, sibling ions, and sibling modification states. A model of iProphet performance concerning the number of correct identifications versus error. An iProphet probability of more than 0.95 was used as the cutoff for the final identification of the protein. For protein quantitation, the MS2-based Normalized Spectral Index (SIN) was used in the StPeter algorithm implemented in TPP for all proteins identified with ≥ 2 unique peptides per protein.

Antimicrobial assay
The total urinary peptide from all three groups extracted by SPE was then assessed for its antimicrobial activity (AMA). To determine its AMA the peptide was reconstituted in milliQ water and coated on 6 mm sterile discs (HIMEDIA, SD067-1VL). The discs were allowed to dry under a laminar hood. 0.5 McFarland equivalent of test cultures (Staphylococcus aureus ATCC 29213, Escherichia coli ATCC 25922) were swabbed on the surface of Mueller Hinton Agar (HIMEDIA, GM173-500G) and allowed to dry. With the help of sterile forceps, the coated discs were placed on the lawn of the test culture and incubated overnight. BSA digest was used as the negative control in the experiment. The appearance of the inhibition zone confirms the antimicrobial activity of the cow urinary peptides derived from all physiological states.

Bioinformatics analysis
All the graphical analyses were performed in the R environment using the ggPlot2 package. The Gene Ontology (GO) categories were analyzed using the DAVID bioinformatics resources and only the genes with an adjusted P-value (false-discovery rate) of less than 0.05 were included for subsequent GO term plotting.

Strategy for manual and MERPOS based protease prediction
Peptide list belonging to different physiological states of cows was sorted based on possible protease cleavage site located on four residues either from N or C terminal. The hypothesis was to identify the protease responsible for its release from the precursor sequence. The peptide sequences were traced back to 21 possible proteases, which provided hints to explain the physiology driven variation in the number and type of sequences derived from each physiological state. Proteasix tool was used for the prediction of cleavage sites which uses the MEROPS peptidase database as a reference. It provides specificity weight matrices for experimentally validated cleavages for proteases. The batch peptide match tool from Protein Information Resource (PIR) was used to retrieve the start and end amino acid position from the peptide sequence. The retrieved information from PIR was used as an input for Proteasix. Cleavages predicted with more than 80% specificity were selected for the heat map analysis. The aforementioned method was used to predict the cleavages and the associated proteases for heifer, lactation, and pregnant groups.

Characterization and classification of bioactive peptides based on machine learning SVM model
In-silico analysis was done using web-based platforms to identify bioactive peptides present in cow urine. For validation purposes, the online activities predicting servers using Support Vector Machine (SVM) based algorithm were used. SVM, an established machine learning tool has been extensively used to design prediction platforms for the classification of various kinds of bioactivities associated with peptides 69-72. Peptides sequences associated with anticancer, anti-inflammatory (AIP), antihypertensive, antimicrobial, and antibiofilm activities were predicted with a high confidence score at or above a higher threshold value of 0.9. Peptides with ACE-inhibitory effect of antihypertensive activity were obtained from antihypertensive peptides inhibitors (AHTpin) (http://crdd.osdd.net/raghava/ahtpin/). The database consists of 1745 entries as a positive dataset from various sources viz. AHTPDB, BIOPEP, ACEpepDB , and literature whereas negative datasets consist of a random fragment of proteins of the same length from Swiss-Prot 71. Sequence logo was designed for the predicted sequence retrieved from AHTpin to determine the abundance of amino acid residues at the C-terminus of the peptides as the presence of certain amino acid residues has a significant impact on antihypertensive activity. For the prediction of AIPs, a web server anti-inflam (http://metagenomics.iiserb.ac.in/antiinflam) was accessed. Anticancer activity was predicted using the tumorHPD server (http://crdd.osdd.net/raghava/tumorhpd/). This tool utilizes 651 experimentally validated peptides (peptides binding to tumor) in a positive dataset and 651 non-tumor binding peptides randomly generated from proteins obtained from SwissProt 70. Two servers CAMPR3 (http://www.camp3.bicnirrh.res.in/campHelp.php), a database containing entries of antimicrobial peptides from various sources and the other dPABBs which uses SVM based algorithm for the prediction of Anti Biofilm Peptides (ABPs) (http://ab-openlab.csir.res.in/abp/antibiofilm/) were used to predict antimicrobial activity associated with the urinary peptides. Peptides exhibiting antibiofilm activity on at least 4 prediction models were selected. The tool uses a positive dataset consisting of 90 AMPs that are proven through in vitro and in vivo reports to be active against biofilms. The prediction is based upon whole amino acid composition, features of the residues, and their positional preferences 72.

Chromosomal mapping of urinary peptides
All the identified proteins were mapped with Uniprot bovine chromosomal proteome information database (Bos taurus taxid 9931) and parsed to create files appropriately formatted for input to Circos for circular visualization and annotation 73.   Pearson correlation-based similarity matrix plot of all chromosomes and associated target proteins determined with respect to endogenous peptides. C. Heat map presentation of proteins quantitative data for common parental proteins among three stages. Fold change is calculated using the heifer as the control for lactation and pregnancy. Proteins were grouped based on the k-mean clustering and fold change calculate with respective color coding.