Novel Approach to Exploring Protease Activity and Targets in HIV-associated Obstructive Lung Disease using Combined Proteomic-Peptidomic Analysis

Background: Obstructive lung disease (OLD) is increasingly prevalent among persons living with HIV (PLWH). However, the role of proteases in HIV-associated OLD remains unclear. Methods: We combined proteomics and peptidomics to comprehensively characterize protease activities. We combined mass spectrometry (MS) analysis on bronchoalveolar lavage fluid (BALF) peptides and proteins from PLWH with OLD (n=25) and without OLD (n=26) with a targeted Somascan aptamer-based proteomic approach to quantify individual proteases and assess their correlation with lung function. Endogenous peptidomics mapped peptides to native proteins to identify substrates of protease activity. Using the MEROPS database, we identified candidate proteases linked to peptide generation based on binding site affinities which were assessed via z-scores. We used t-tests to compare average forced expiratory volume in 1 second per predicted value (FEV1pp) between samples with and without detection of each cleaved protein and adjusted for multiple comparisons by controlling the false discovery rate (FDR). Findings: We identified 101 proteases, of which 95 had functional network associations and 22 correlated with FEV1pp. These included cathepsins, metalloproteinases (MMP), caspases and neutrophil elastase. We discovered 31 proteins subject to proteolytic cleavage that associate with FEV1pp, with the top pathways involved in small ubiquitin-like modifier mediated modification (SUMOylation). Proteases linked to protein cleavage included neutrophil elastase, granzyme, and cathepsin D. Interpretations: In HIV-associated OLD, a significant number of proteases are up-regulated, many of which are involved in protein degradation. These proteases degrade proteins involved in cell cycle and protein stability, thereby disrupting critical biological functions.


INTRODUCTION
The use of highly active antiretroviral therapy (ART) has signi cantly reduced morbidity and mortality among those living with HIV.However, as life expectancy has increased, there has been a rise in comorbidities, including obstructive lung disease (OLD).[1][2][3][4][5][6][7][8] Persons living with HIV (PLWH) are at increased risk of accelerated lung function decline and developing OLD, even after adjusting for smoking.[9,10] The Global Initiative for Obstructive Lung Disease (GOLD) 2024 report now recognizes HIV as a risk factor for COPD.[11] In non-HIV associated OLD, chronic in ammation and the activation of proteases play a crucial role in its pathogenesis.The severity of air ow obstruction often corresponds to the degree of in ammation in the lung and airways.In PLWH, various factors have been implicated in the development of OLD,, such as epigentic aging, chronic systemic in ammation, innate immune activation, and abnormal immune function related to HIV. [12,13] Furthermore, in PLWH who smoke and have emphysema, there is an upregulation of matrix metalloprotineases (MMP-1, -7, -9 and -12) compared to HIV-negative individuals, underscoring the potentially signi cant role of proteases in OLD pathogenesis in PLWH.[14] Numerous proteases contribute to lung disease, falling into three general categories that include serine proteases, cysteine proteases, and matrix metalloproteinases (MMP).The cellular sources of these proteases include in ammatory cells such as neutrophils and macrophages, as well as bronchial epithelial cells.While the impact of individual proteases on the lung extracellular matrix (ECM) has been well-documented in numerous studies, the extent to which other proteins are subject to proteolytic degradation and the physiological effects of this destruction remain relatively unknown.Previous studies have predominantly focused on individual proteases or their families in OLD, mainly limited to ECM targets.To better understand the role of proteases in HIV-associated OLD, we used complementary proteomic techniques combined with peptidomics to identify active proteases associated with OLD, comparing their activity in disease versus health and elucidating their speci c targets.

METHODS
Study Population: PLWH who had undergone bronchoscopy were selected from the Pittsburgh and Vancouver Lung HIV Cohorts.[15,16] This consisted of individuals (n=25) with OLD as de ned as the ratio of forced expiratory volume in 1-second/forced vital capacity (FEV1/FVC) < lower limit of normal.Those without OLD consisted of 26 individuals with HIV and normal lung function (de ned as FEV1/FVC > lower limit of normal and FEV1 > 80% of predicted normal) matched on age (+/-5 years), antiretroviral treatment use, and smoking status (current vs. non-smoker).Participants in the parent cohort studies provided informed consent for BALF collection and storage with approval by their respective Institutional Review Boards at Pittsburgh and Vancouver.At study enrollment, BALF was collected on fasting participants as previously described.[15,16] Pulmonary function tests were performed within 3 months of collecting the samples.All data and samples were sent to the University of Minnesota were de-identi ed.The current study was reviewed and accepted by the University of Minnesota Institutional Review Board (Number 00003486).
Protein Processing and Protease Identi cation: BALF samples underwent centrifugation at the local collection sites to remove cells, and cell-free BALF samples were stored at -80 degrees Celsius prior to processing.The BALF was processed as previously described.[17] Brie y, the cell-free BALF samples were centrifuged twice to separate out the insoluble component of BALF from the soluble fraction.Endogenously produced peptides were collected from the soluble component of the supernatant via a 3kDa MW cutoff lter.The soluble component of the supernatant was sent for SomaScan, analysis as previously reported, and MS analysis.[18]BALF samples from 21/25 with OLD and 24/26 with normal lung function had adequate protein amounts for tandem mass tagging (TMT, Thermo Fisher Scienti c) and MS analysis.The insoluble BALF component was also processed for TMT labelling and liquid chromatography (LC) tandem mass spectrometry (MS/MS) analysis.Proteins were matched to UniProt IDs using Fragpipe.The combined proteins from SomaScan and MS were ltered to identify proteases and peptidases with known substrates based on the MEROPS database.[19] We utilized the STRING database to visualize protein networks.[20] Peptide Analysis and Protease Assignment: The endogenous peptides isolated from the BALF underwent label-free identi cation and quanti cation by LC-MS that included delayed normalization and maximal peptide ratio extraction (MaxLFQ).A FASTA database was downloaded containing protein sequences of the entire human proteome (UniProt proteome sequence 2021-12-10, 101,014 protein sequences).The peptide tandem mass spectra (MS/MS) les were matched to the FASTA les using the Fragpipe software and were assigned to their native protein substrates.[21][22][23][24][25][26][27][28] Peptides matched with the Fragpipe software were quanti ed using the MaxLFQ method, and assigned cleavage sites.The cleavage sites were categorized by type of cleave based on cleave location and whether other similar peptides were detected, indicative of multiple cleavage events.The cleaves assigned were a result of exopeptidase or endopeptidase activity and mapped back to the original FASTA protein sequence with 4 residues before and after each cut, depending on location of the cut, based on starting residue and peptide length (Fig. 1S).The MEROPS catalog of preferred substrate patterns of cleavage was compared to our assigned cleavages from detected peptides.For each protease, a z-score was calculated for each cleave using z=(x-μ)/σ where x was the number of substrates in the MEROPS database with a given amino acid at a speci c position, μ was the average number of substrates with any data for that cleave position, and σ for the standard deviation of the substrates for that cleave position.We treated the z-scores as a quantitative indicator for whether the peptide matches the protease's target cleavage sequence.A higher z-score implied a higher likelihood that the protease cleaved a protein and yielded the corresponding peptide.We assigned cleaved proteins to proteases if the associated z-score was deemed an "outlier."To de ne an outlier, we computed the z-score quartiles and interquartile range (IQR) within each protease.We de ned an outlier as a peptide's z-score exceeding the third quartile plus 1.5 times the interquartile range for that protease.

Statistical Analysis:
All data underwent cleaning prior to performing statistical analysis (see Supplement).We sought to describe associations between the detected proteases along with the degraded proteins mapped from the endogenous peptides with measures of lung disease, de ned as percent predicted forced expiratory volume in 1 second (FEV1pp).
Proteases Associated with Lung Function: We examined the overall association between protease abundance and FEV1pp using the combined SomaScan and two untargeted MS datasets from the soluble and insoluble components of BALF.For each identi ed protease, we calculated the correlation between the measured abundance and FEV1pp to the SomaScan and the two untargeted datasets.For the SomaScan dataset, we averaged the correlations across aptamers and proteins detected across datasets if multiple aptamers were present.We used the p-values from a Pearson correlation test to assess the strength of association between protease abundance and FEV1pp.We obtained an overall pvalue for each protease by aggregating the individual p-values using Fisher's combination method.[29] We controlled the false discovery rate (FDR) using the Benjamini-Hochberg correction.[30] We report on associations that were signi cant at the FDR < 0.05 level.
Association Between Protein Degradation and Disease: For each protein assigned to an endogenous peptide, we dichotomized patients into two groups: one in which the degraded protein was detected and one in which it was not.A protein was "detected" if its corresponding MaxLFQ intensity was non-zero.
Due to heavy missingness, we only considered proteins detected in at least ve samples.We compared the average forced expiratory volume in 1-second (FEV1pp) between these two groups for each protein using a two-sample t-test.We controlled the Benjamini-Hochberg FDR.[31] For pathway analysis we used a less stringent FDR of below the 0.1 level using IMPaLa software to examine pathways re ected among the degraded proteins.[32] RESULTS Study Participant Demographics: Study Participant Demographics: Error!Reference source not found.
summarizes the demographics of participants whose samples were used in the endogenous peptides analysis.The soluble and insoluble components of BALF TMT datasets differed by two samples from individuals with OLD and the SomaScan dataset differed by one sample from an individual with OLD, but overall showed similar demographic distributions across those with and without OLD.Most of the participants were male (72.5%) with a mean age of 56.8 and 54.9% identi ed as black, non-Hispanic, 43.1% as white or Hispanic/Latino, and 2.0% identi ed as Asian or Paci c Islander.Most participants were receiving antiretroviral treatment (ART) (92.2%) at the time of study.Smoking status was similar between those with and without OLD, with 52.9% actively smoking at the time of enrollment, however, average pack years were greater in those with OLD (31.1) vs those without OLD (15.2).Lung function ranged from 21 to 128% of predicted normal.Among those with OLD, the average FEV1pp was 67.5% and for those without OLD the average was 104%.
Proteases Associated with Lung Function: To enhance our proteomic coverage to identify proteases in BALF, we leveraged the previously-reported SomaScan proteomic data from the BALF soluble component, along with proteins measured by TMT with MS of both the soluble and insoluble BALF components.[18] A total of 101 proteases were identi ed, many of which overlapped between the three different methods of measuring proteins (Fig. 1, Table 1S).Of these proteases, 40 were unique to Somascan, 9 unique to the insoluble component of BALF and 3 in the soluble BALF component measured by TMT.Most of these proteases make up a network that is functionally associated or linked (Fig. 2S).We identi ed 22 proteases that were associated with FEV1pp, four positively correlated and 18 negatively correlated (Table 2).The four proteases that correlated with higher lung function included carboxypeptidase M, prothrombin, urokinase-type plasminogen activator and gastricsin.Many of the 22 proteases associated with lower lung function are proteases previously described in OLD, including cathepsins, metalloproteinases (MMP), caspases and neutrophil elastase.All but six of these proteases have functional associations with each other (Fig. 2a).
Protein Substrates Subject to Proteolytic Cleavage: We identi ed 31 proteins, mapped from endogenous peptides, that were the substrates for proteolytic cleavage and associated with FEV1pp (Table 3).Table 3 depicts the top 15 proteins and the mean FEV1pp among participants for whom their samples contained these substrate proteins.Among the top ve proteins were alpha-enolase, an enzyme involved in glycolysis, histones, and tubulin.Among these 31 proteins, 28 proteins showed inverse relationships with FEV1pp, i.e. increased degradation was associated with lower average FEV1pp, indicating these proteins were more likely to be subject to proteolysis in the presence of OLD. Figure 3a depicts the protein-protein interaction of these 31 proteins and all but three have functional associations.The top ten pathways re ected among these 31 proteins are shown in Table 4.There were 39 pathways with FDR below 0.05, including pathways involving small ubiquitin-like modi er mediated modi cation (SUMOylation), a post-translational process to control protein quality [33] and histone methylation.
Proteases Participating in Substrate Cleavage: To identify the linked to the generation of the endogenous peptides, we analyzed 101 proteases identi ed across the SomaScan and TMT datasets with the top 31 identi ed substrate proteins that associated with FEV1pp.After linking candidate endogenous peptides to their corresponding proteases responsible for their cleavage by examining the z-scores, we studied how many proteins each protease cleaved.The number of proteins assigned to each protease ranged from one to 23 (Fig. 3; Table 2S) with the top 10 proteases included neutrophil elastase, granzyme, and cathepsin D (Table 5).

DISCUSSION
Proteases are a diverse group of proteins comprising over 500 members which makes up almost 2% of the human genome.There are ve major classes of proteases in mammals with serine, cysteine and metallo-proteases being the most prevalent in human lung disease.Traditionally these proteases have been viewed as substrate speci c protein degrading enzymes and originally had not been considered to be participants in signaling or regulatory pathways.In the last decade, advances in degradomics and the study of protease substrate have revealed that protease targets and their substrates are complex.[34,35] It is now evident that proteases are key components of regulatory mechanisms via cleavage of speci c substrates with concomitant activation, silencing or modulation of regulatory functions through a mechanism called proteolytic processing.[34] While most studies related to the role of proteases in OLD, both HIV and non-HIV associated, have been limited to individual proteases or protease families; it is highly unlikely that single proteases or even single protease families are solely responsible for OLD pathogenesis.More likely there are complex interactions among proteases and their substrates that participate in multiplexed regulatory systems.In this study, we characterized the complex protease proteome in HIV-associated OLD and identi ed protease substrates and their cognate pathways.
Utilizing a comprehensive proteomic approach that included a combination of aptamer-based proteomics and mass spectrometry with TMT labeling we identi ed 101 proteases within the BALF, 22 of which were signi cantly associated with lung function as measured by FEV1pp.Proteases are key regulatory proteins in both homeostasis and disease and several of the proteases we identi ed are associated with normal lung function.One protease, gastricsin, is a gastric protease and likely represents micro-aspiration, which is common in individuals with OLD.[36] Interestingly, gastricsin was observed in individuals with preserved FEV1.Aspiration is likely to be equally, if not more, common in those with severe lung function, although it is probably less prevalent compared to the proteases that are upregulated in disease.We found both prothrombin and urokinase-type plasminogen activator to be associated with normal lung function and these proteases have roles in brin homeostasis in the healthy lung.[37,38] Many of the proteases associated with lower lung function have been described in OLD, such as the metalloproteinases, cathepsins, caspases and neutrophil elastase.Caspases are proteases involved in apoptosis and associated with the generation of emphysema.[39] Unfortunately, we were not able to correlate speci c proteases with emphysema in this cohort as CT imaging was limited.
What is most striking is that no single protease or protease family predominates.Rather, there is upregulation of many proteases across divergent protease families.Proteases can interact either directly or indirectly with other proteases and become interconnected in what has been termed a 'protease web'.[40] We found that all but six of the proteases that associated with lung function were part of such a functionally-associated network.This interconnection and redundancies of proteases in OLD create challenges in identifying therapeutic targets for anti-protease therapy.
Proteases initiate and modulate many important cellular functions by highly speci c substrate cleavage.In the in ammatory state, upregulated proteases have a wide range of substrate targets that are not limited to extracellular matrix proteins.Not only do proteases cleave multiple substrates, but substrates can be cleaved by multiple proteases.Utilizing peptidomic analysis by mass spectrometry, we were able to map endogenously produced peptides to their cognate proteins.All but three of these proteins had functional associations, suggesting the targeting or susceptibility of speci c biological pathways.In addition, most of these proteins were associated with lower lung function and mapped to pathways vital for cellular function, including (-SUMOylation.SUMOylation is critical in broad biological functions including cell cycle and protein stability.Cigarette smoke upregulates SUMOylation in human bronchial epithelial cells, providing a potential link to OLD. [41] Although our complementary proteomics analysis identi ed proteases that had upregulated abundance with disease, increased abundance does not guarantee increased protease activity.Fortunately, our analysis of endogenous peptides enabled characterization of protease activity.To identify the proteases responsible for protein degradation we assigned cleavage sites to endogenously produced peptides and matched these sites to their conjugate proteases.Neutrophil elastase, granzyme M and cathepsins D and E were among the most active proteases linked to substrate degradation.These proteases are commonly associated with OLD.Although metalloproteinase and caspase proteases were upregulated in disease, they were less active. Limitations of this study include the relatively small sample size, lack of non-HIV controls and lack of more detailed lung structure and function metrics, (e.g.CT quantitative imaging) and the large percentage of smokers, current or past, in both those with normal lung function and disease.Overall, this Tables Tables 1 to 5 are available in the Supplementary Files section

Figures
Figures

Figure 2 a
Figure 2