Metabo-Endotypes of Asthma Reveal Clinically Important Differences in Lung Function: Discovery and validation in two TOPMed Cohorts

Current guidelines do not suciently capture the heterogeneous nature of asthma; a detailed molecular classication is needed. Metabolomics represents a novel and compelling approach to derive asthma endotypes, i.e., subtypes dened by functional/pathobiological mechanisms. In two cohorts of asthmatics, untargeted metabolomic proling and Similarity Network Fusion was used to derive and validate ve “metabo-endotypes” of asthma, which displayed signicant differences in asthma-relevant phenotypes including pre-bronchodilator and post-bronchodilator forced expiratory volume/forced vital capacity (FEV 1 /FVC). The “most-severe” asthma metabo-endotype was dened by the lowest FEV 1 /FVC and characterized by altered levels of phospholipids and polyunsaturated fatty acids, suggesting dysregulation of pulmonary surfactant homeostasis. This was supported by genetic analyses as members of this endotype were more likely to carry variants in key pulmonary surfactant regulation genes including BMPR1B (meta-analyzed p=2.8x10 -4 ) and BMP3 (meta-analyzed p=5.23x10 -4 ). These ndings suggest clinically meaningful endotypes can be derived and validated using metabolomic data. Interrogating the drivers of these metabo-endotypes can help understand their pathophysiology. clinically meaningful differences in patient outcomes 5 . Treatments and management strategies based on these underlying pathobiological mechanisms, rather a approach, improved and The relative contribution of genetics and to formation these mechanistically driven endotypes is likely to vary between endotypes. Metabolomics reects genetics, environmental factors, and their interactions 6 , and as the closest to phenotype provides real-time insight into the physiological of represents a novel and compelling approach to identifying asthma endotypes with the potential for biological insight and clinical translation. Eosinophil count also forms a component of the commonly cited T-helper type 2 (Th2)-high and Th2-low endotypes 49 . However, such subgroups have not yet demonstrated clear clinical utility 29 . More promisingly, endotypes based on gene expression proles in participants from U-BIOPRED were shown to differ in their responses to oral corticosteroids 37 , while another study on exhaled breath demonstrated that levels of VOCs could help to predict steroid responsiveness 39 . An important facet of omic-driven endotypes is their ability to inform therapeutic and management approaches. Results in our discovery cohort suggested differences in medication usage between the metabo-driven endotypes, but validation was complicated by differences between the cohorts, and further investigation is required.


Introduction
Asthma affects 26 million children and adults in the U.S. and remains a leading cause of morbidity 1,2 . Asthma is characterized by variable reversible air ow obstruction, nonspeci c airway hyperresponsiveness and airway in ammation; however, there is substantial heterogeneity in its etiology, pathology and manifestation 3 . Current guidelines for de ning asthma, which categorize cases from mild to severe, do not su ciently capture this heterogeneity, leading to suboptimal management strategies in certain subgroups 4 . A more detailed molecular classi cation of asthma is needed.
It is hypothesized there are multiple asthma endotypes, (i.e., subtypes de ned by their functional or pathobiological mechanisms) that confer clinically meaningful differences in patient outcomes 5 . Treatments and management strategies based on these underlying pathobiological mechanisms, rather than a 'one-size-ts-all' approach, may be more effective in terms of improved outcomes and optimized use of health-care resources. The relative contribution of genetics and environment to the formation of these mechanistically driven endotypes is likely to vary between endotypes. Metabolomics re ects genetics, environmental factors, and their interactions 6 , and as the 'ome closest to phenotype provides real-time insight into the physiological state of an individual. As such it represents a novel and compelling approach to identifying asthma endotypes with the potential for biological insight and clinical translation.
Interrogating high-dimensional omic datasets to infer biological meaning can be challenging. Clustering methods have proven to be powerful in the identi cation of molecular subtypes of asthma that differ by atopic status, eosinophil count, and cytokine levels 5,7-11 but to date, none have taken an untargeted approach leveraging the global metabolome. In this study, we aim to derive and validate clinically meaningful "metabo-endotypes" of asthma.

Methods
The study schematic is described in eFigure1.

Study Populations
The study populations have previously been described. The Genetics of Asthma in Costa Rica Study (GACRS) 12 recruited 1,165 children aged 6-14 years with asthma (physician's diagnosis and ≥ 2 respiratory symptoms or asthma attacks in the prior year). At enrollment, all children completed a protocol including questionnaires, blood collection, and spirometry conducted with a Survey Tach Spirometer (Warren E. Collins; Braintree, MA) in accordance with American Thoracic Society recommendations 12 . Written parental and participating child consent/assent was obtained. The study was approved by the Mass General Brigham Human Research Committee at Brigham and Women's Hospital (Boston, USA); Protocol#: 2000-P-001130/55, and the Hospital Nacional de Niños (San José, Costa Rica).
The Childhood Asthma Management Program (CAMP) 13 (Clinicaltrials.gov: NCT00000575) is a completed randomized clinical trial of inhaled treatments for mild-to-moderate asthma (symptoms for > 6 months in the year prior to interview and PC 20 < 12.5mg/mL) in children aged 5-12 at baseline. All children completed a similar protocol to GACRS. The study was approved by the institutional review board of Mass General Brigham Healthcare (Protocol#: 1999-P-001549/29), by all participating clinical centers and the Data Coordinating Center. Child assent and parental written consent was obtained. Participants who had available plasma samples with su cient volume from the end of the trial visit (4 years post-baseline) were selected for this current study. (eMethods) Metabolomic Pro ling Metabolomic pro ling was conducted using a combination of four complimentary liquid chromatography tandem mass spectrometry (LC-MS) platforms as part of the Trans Omic Precision Medicine (TOPMed) initiative 14 . Three nontargeted LC-MS methods using high resolution, accurate mass (HRAM) pro ling measured: i) polar and nonpolar lipids (C8-pos); ii) free fatty acids, bile acids, and metabolites of intermediate polarity (C18-neg); and iii) polar metabolites including amino acids, acylcarnitines, and amines (HILIC-pos). An additional targeted LC-MS pro ling method measured intermediary metabolites including TCA cycle intermediates, purines and pyrimidines, and acyl CoAs (Amide-neg) 15,16 . Metabolite exclusions and QC are described in eMethods (eTable1). Statistical Methods Derivation of "metabo-endotypes" in GACRS We grouped 1151 subjects from the GACRS based on their metabolite residuals (adjusting for age, sex, and BMI (and race in CAMP) to account for their potential in uence on the metabolome 17 ) into distinct metabolomic-driven endotypes, using Similarity Network Fusion (SNF) [R package: SNFtool version 2.2] 18,19 and spectral clustering 20 (eMethods). We examined whether the omic-derived alterations in biological pathway between the clusters (endotypes), resulted in measurable clinical or epidemiological differences using one-way analysis of variance (ANOVA) for continuous variables and chi-squared tests for categorical variables. We determined that we had very good-excellent power to detect differences across the endotypes (eTable2).

Validation of endotypes
In the CAMP cohort, we utilized the label propagation classi er approach as a machine learning method to predict which metabo-endotype the new subjects belonged to 18 . We then assessed the clinical and phenotypic characteristics of CAMP subjects within these GACRS-de ned endotypes. Identi cation of metabolomic drivers of meta-endotypes We utilized independent logistic regression models and a one-endotype-versus-the-rest approach to identify the metabolites that contributed the most to the formation of each endotype. We meta-analyzed the GACRS and CAMP results using a random effects model.
All analyses were conducted in R version 4.0.0.

Study Population
In GACRS, 1151 subjects with asthma had plasma samples available for metabolomic pro ling, and in CAMP, 911 subjects with asthma had suitable plasma samples extracted at the end of trial (Table 1). In the original CAMP trial, no signi cant difference in lung function outcomes between the study arms was found 21 . A total of 2, 2, 2, and 3 clusters of GACRS asthmatics were identi ed based on metabolite residuals from the C8-pos, C18-neg, HILIC-pos and Amide-neg platforms respectively. We applied SNF to fuse the networks from the four platforms, with convergence after 10 iterations, and spectral clustering resulting in ve clusters designated as the asthma metabo-endotypes containing 213, 270, 222, 232, and 214, asthma cases respectively (eFigure2). Based on the Adjusted Rand Index (ARI), the fused network clusters were most similar to those of the Amide-neg platform (ARI = 0.297) (eTable3).
Endotype3 had the lowest number who reported use the use of beta2-agonists in the previous year (eFigure4). Consequently, endotype2 was designated the "most-severe" asthma endotype, and endotype3 the "least-severe".  (Table 3 and Fig. 1). Given that prior to sample collection CAMP subjects had been randomized to differing treatment regimens, we could not directly compare medication use and no signi cant differences were observed (eFigure6). The differences in log10 eosinophil count and hay fever prevalence seen in GACRS were borderline signi cant across the endotypes in CAMP (p = 0.050 and p = 0.061, respectively) although the patterns differed (eFigure5). Additionally, there was evidence that vitamin D levels differed signi cantly across the endotypes in CAMP (4.83x10 − 5 ).
Pathways/classes are named if ≥ 5 metabolites belonging to that class are associated with membership in a given direction (or if they are the most abundant pathway/class for the endotype in a given direction) Short Chain Carnitines -carbon chain length ≤ 8; Medium Chain Carnitines -carbon chain length 9 to ≤ 16; Long Chain Carnitine -carbon chain length >

16
Genetic drivers of the "most-severe" endotype Finally, we sought to identify genetic variants that may be associated with membership of the "most-severe" low PUFA, high phospholipid endotype (Endotype2) compared to all other endotypes based on available whole genome sequencing data 14,25 . We explored 1,115,764 SNPs after employing a minor allele frequency < 0.05 and r 2 > 0.2. Using an additive genetic model adjusting for the rst four principal components, age and sex, and metaanalyzing the results across GACRS and CAMP, no SNPs were signi cant in the meta-analysis at a Bonferroni threshold of 4.48x10 − 8 . Therefore, we employed a nominal p-value threshold of 0.01 to account for ve endotypes and ltered on a concordant direction of effect and a nominal p-value < 0.05 in both cohorts, resulting in 1382 SNPs associated with membership (eFigure9). SNPs were annotated to genes using the biomaRt package 26 and gene set enrichment analysis was conducted using gPro leR 27 . This identi ed an enrichment of "anatomical structure morphogenesis" (p = 1.8x10 − 5 ) and other processes that may be involved in lung development, as well as the microRNA hsa-miR-4517, which has been shown to be altered between asthmatic and normal bronchial epithelial cells 28 . Among the top SNPs associated with membership (eTable11) were those mapping to genes associated with pulmonary function and disease including rs7751017 in SMOC2 (p = 6.31x10 − 5 ) and rs11099459 in BMP3, p = 5.23x10 − 4 ); the regulation of pulmonary surfactant homeostasis, rs2120834 in BMPR1B (p = 2.8x10 − 4 ), biosynthesis of glycoproteins including, rs7125946 in GALNT18 (p = 1.01x10 − 4 ), rs1648282 in DUOX1 (p = 2.12x10 − 4 ) and the immune in ammatory response including rs1294053 in SPSB1 (2.57x10 − 4 ) (eTable12).
In this study, we identi ed ve asthma metabo-endotypes with differing lung function and clinical characteristics driven by distinct metabolomic pathways. We further determined a potential genetic component to the most severe endotype. Crucially, we were able to validate these ndings in an independent cohort of childhood asthmatics.
Although several studies have attempted to identify asthma endotypes 29,30 , these have been somewhat limited and have tended to use one of two general approaches: a priori de nitions of a phenotype based on characteristics of subjects; or pathobiologic differences in sputum or bronchoscopy specimens 5,31−34 . The resulting endotypes have often demonstrated high overlap in important clinical features rendering them challenging for clinical use. More importantly, they provide little information on underlying mechanisms. Among the few studies utilizing omic data to derive clusters 29,30,35−37 sample sizes have been small; however, results support the existence of multiple heterogeneous asthma subtypes with differing molecular pro les and pathophysiological pathways. Although several studies have incorporated metabolomics into their exploration of endotypes 7,9,10,38,39 , to date none have leveraged unsupervised clustering of the global blood metabolome to sub-phenotype asthmatics.
There were signi cant differences in asthma-relevant phenotypic characteristics across our endotypes, speci cally with regard to lung function. Based on these metrics, endotype2 was characterized as the "most-severe", with individuals in this group demonstrating the greatest degree of air ow obstruction and reporting the highest usage of oral corticosteroids and beta2-agonists. Endotype3 was designated the "least-severe" endotype using these same metrics. We recapitulated these endotypes in an independent population based on their metabolome and observed almost identical differences in lung function across the endotypes. Although differences in medication usage were not validated, we hypothesize that this is because CAMP blood was collected at the end of a clinical trial, which would have dictated use of steroids in the previous year, as well as differences in the prescribing and therapeutic approaches applied by the respective health systems of the Costa Rican-based discovery cohort, and USA-based validation cohort. In contrast, FEV 1 /FVC, which replicated between the two populations, provides a more objective measure.
We further observed differences in allergic characteristics between the discovery endotypes. In GACRS, the "most-severe" endotype displayed the highest blood eosinophil counts and proportion of eosinophilic asthmatics. This was not replicated in CAMP which may be explained by the underlying differences in immune phenotypes between the two populations, with CAMP displaying signi cantly higher log(IgE) levels ( The "most-severe" endotype2 was characterized by increased levels phosphocholines, bile acid metabolites and sphingomyelins and decreased levels of both n3 and n6 long chain PUFAs. While individuals in endotype1 who showed better metrics of lung function demonstrated higher levels of the same PUFAs. This may be explained by the fact that PUFAs have been shown to play a role in pulmonary function and disease through their role in maintenance of the pro-in ammatory -pro-resolvin pathways 24,40 . Intriguingly, PUFAs have also been demonstrated to be important in the homeostasis of pulmonary surfactant 41 , which lines the inner surface of the lung and works to lower surface tension and prevent alveolar collapse as well as playing a role in innate immune defense 42 . Consequently, dysregulation of surfactant homeostasis has been implicated in pulmonary diseases and reduced lung function in both adults and children. Pulmonary surfactant has multiple integrated and highly regulated lipid metabolite components including phospholipids (phosphatidylcholines, phosphatidylglycerols, phosphatidylethanolamines), triglycerides, cholesterols, fatty acids and sphingomyelins 43 , which were found to be among the greatest drivers of endotype membership and therefore lung function. A large number were associated with membership of the "most-severe endotype". Endotype4 which was associated with lower levels of many of these same metabolites also displayed less severe phenotypes. This leads to the hypothesis that differences in pulmonary surfactant homeostasis re ected in the blood may underlie the severity metrics observed in this endotype, which is further supported by the genetic analysis. Several SNPs mapping to genes involved in the regulation of pulmonary surfactant homeostasis were associated with membership, including BMPR1B and BMP3, members of the bone morphogenic protein family that signal through transmembrane serine-threonine kinase receptors to in uence lung morphogenesis and support neonatal respiratory function via enhanced expression of surfactant glycoproteins 44,45 . These pulmonary surfactant glycoproteins play critical roles in the innate immune response and anti-in ammatory effects 46 , and among genes mapping to the signi cant SNPs for the "most-severe" endotype were also a number involved in the biosynthesis of glycoproteins including GALNT18 and DUOX1. Several SNPs also mapped to genes involved in the immune in ammatory response such as SPSB1, which may explain the high eosinophil count in this endotype.
A potential difference in blood eosinophil count between asthma endotypes is in agreement with existing sub-phenotyping studies of asthmatics.
Clustering of 9 clinical variables identi ed and validated in the ADEPT and U-BIOPRED cohorts identi ed four groups with distinct clinical and biomarker pro les, one of which had a "moderate, hyper-responsive, eosinophilic" phenotype, with moderate asthma control, mild air ow obstruction and predominant Type-2 in ammation 47 . Similarly, three separate studies of exhaled breath samples showed that models based on volatile organic compounds (VOCs) could classify asthma as eosinophilic or neutrophilic with high accuracy 9,38,48 . Eosinophil count also forms a component of the commonly cited T-helper type 2 (Th2)-high and Th2-low endotypes 49 . However, such subgroups have not yet demonstrated clear clinical utility 29 . More promisingly, endotypes based on gene expression pro les in participants from U-BIOPRED were shown to differ in their responses to oral corticosteroids 37 , while another study on exhaled breath demonstrated that levels of VOCs could help to predict steroid responsiveness 39 . An important facet of omicdriven endotypes is their ability to inform therapeutic and management approaches. Results in our discovery cohort suggested differences in medication usage between the metabo-driven endotypes, but validation was complicated by differences between the cohorts, and further investigation is required.
There were a number of limitations to these analyses. All participants were under 18 years of age at blood collection. Additional work in older populations is required to determine the generalizability of the ndings to adult-onset asthma, although, we note that early-onset asthma may represent the larger public health burden due to its higher prevalence 50 . Cluster analysis is a descriptive method, and groups can be de ned even when there is no underlying structure in the data; however, we addressed this by assessing the clinical characteristics of the clusters in two different populations. It should be noted these clusters are based on a single timepoint, and repeated sampling is necessary to assess their temporal stability, although encouragingly, previous clustering studies in asthmatic populations have demonstrated good longitudinal cluster stability 47 . Endotypes were derived via metabolomic pro ling of blood. The utility of blood for asthma studies is supported by the literature 51 and has the bene ts of being readily accessible, vital for the development of clinically translatable biomarkers. However, future studies should address the replicability of these endotypes in different biosamples, particularly those closest to the lung such as sputum.
There are also several strengths to this study. It employs a unique design, utilizing a bottom-up approach from molecular signatures to clinical endotypes, unlike the majority of studies that have clustered based on phenotype, potentially missing mechanistic information. The patient similarity networks were generated using state-of-the-art pro ling platforms, providing broad coverage and highly reproducible data. We leveraged machine learning approaches to derive endotypes and most crucially we were able to validate our ndings in an independent population with comparable metabolomic, phenotypic and clinical data, despite underlying differences between the cohorts.

Conclusion
In conclusion, asthma represents a spectrum of disorders with heterogenous etiologies and clinical presentations, yet its clinical de nition has remained unchanged for more than 50 years 29 . A signi cant proportion of asthmatics do not respond to the "one-size-ts-all" management approach, and it is these patients who are responsible for the majority of asthma related health care costs and economic burden 52 . This study, which is by far the largest to leverage metabolomics for asthma endotyping and the rst to use an unsupervised metabolome-wide clustering approach, proposes ve novel validated asthma metabo-endotypes. These endotypes provide strong candidates for more precise asthma management strategies while informing on underlying mechanisms, paving the way for more personalized approaches to asthma management and a new era of precision medicine.
Declarations Figure 2 Pattern of Metabolomic Pathways/Classes driving Membership of each Endotype. Pathways/classes are named if ≥5 metabolites belonging to that class are associated with membership in a given direction (or if they are the most abundant pathway/class for the endotype in a given direction) Short Chain