Genome-wide association identifies novel ROP risk loci in a multi-ethnic cohort

We conducted a genome-wide association study (GWAS) in a multiethnic cohort of 920 at-risk infants for retinopathy of prematurity (ROP), a major cause of childhood blindness, identifying 2 loci at genome-wide significance level (p<5×10–8) and 7 at suggestive significance (p<5×10–6) for ROP ≥ stage 3. The most significant locus, rs2058019, reached genome-wide significance within the full multiethnic cohort (p=4.96×10–9); Hispanic and Caucasian infants driving the association. The lead single nucleotide polymorphism (SNP) falls in an intronic region within the Glioma-associated oncogene family zinc finger 3 (GLI3) gene. Relevance for GLI3 and other top-associated genes to human ocular disease was substantiated through in-silico extension analyses, genetic risk score analysis and expression profiling in human donor eye tissues. Thus, we report the largest ROP GWAS to date, identifying a novel locus at GLI3 with relevance to retinal biology supporting genetic susceptibilities for ROP risk with possible variability by race and ethnicity.


Introduction
Retinopathy of prematurity (ROP) is a retinal vascular disease that affects premature infants and is a leading cause of childhood blindness worldwide [1][2][3] . Birth weight (BW) less than 1500 grams, prematurity less than 32 weeks gestational age (GA), and post-natal oxygen exposure confer independent ROP risk 4,5 . However, this understanding of risk and pathogenesis is incomplete as only approximately 50-65% of infants with these risk factors will develop ROP disease 4,6,7 . For example, phenotypic extremes that do not conform to current risk pro les exist; preterm infants born at large birth weights and/or older gestational ages may develop treatment-warranted ROP whereas those within the risk pro les of GA and BW do not always develop treatment-warranted ROP 8 . The current strati cation also does not address prevention; although post-natal oxygen exposure is the most modi able risk, limiting oxygen increases infant mortality [9][10][11] . As a result, current screening lacks speci city and interventions are unable to modify preclinical ROP risk, instead targeting ROP once retinal disease is present and there is signi cant risk of blindness 6,12 . Improved understanding of risk will allow better detection and treatment of infants with greatest risk for severe ROP and may provide novel insights into disease patho-mechanisms in order to prevent disease. ROP risk is multifactorial and although genetic risk is not fully elucidated, there is a growing body of evidence supporting a genetic basis for ROP, including twin studies [13][14][15] . Case control and whole exome candidate approaches have identi ed signi cant risk associations between single nucleotide polymorphisms (SNP) in brain-derived neurotrophic factor (BDNF; rs7934165 and rs2049046), thrombospondin type-1 domain-containing protein 4 (THSD4), TNF -308G/A polymorphism and angiotensin 1 converting enzyme insertion deletion (ACE ID) polymorphism [16][17][18][19] (Supplementary  Table 1). By contrast, recent candidate work demonstrated signi cant protective associations between the BDNF SNP rs7929344 and ROP and severe ROP risk associations between SNPs within VEGFA, NOS3 and EPAS1 20 . While no SNP has been shown to reach genome-wide signi cance, meta-analysis has substantiated some associations 18 while also beginning to identify pathobiology underlying observed differences in ROP risk relative to race; an example being recent work demonstrating association of the VEGFA+405G>C polymorphism association with ROP risk in Caucasian populations 21 . This observation aligns well with clinical observations showing racial differences in risk, including greater ROP severity and vision loss in white as compared to black preterm infants with equivalent BW and GA ROP risk 22,23 . Further, clinical progression has been shown to vary in Hispanic compared to white non-Hispanic populations 24 .
Taken together, our current understanding of ROP risk is incomplete, which impairs our ability to predict infants at greatest disease risk or identify pathobiology allowing for ROP prevention. While ROP risk is multifactorial, evidence supports genetic contributions to risk and protection. Herein we present a GWAS analysis based within the i-ROP consortium, including collaborators from 14 academic institutions throughout the world, demonstrating a novel ROP risk association with the SNP GLI3 rs2058019 reaching genome-wide signi cance (p=4.90E-09) within our multi-ethnic cohort. We further demonstrate broader applicability for this SNP and GLI3 genetic variation with other forms of retinal disease characterized by pre-retinal neovascularization, namely diabetic retinopathy. Finally, we identify additional SNPs with suggestive-level signi cance association with severe ROP disease, demonstrate cross-signi cance with previously identi ed ROP-associated SNPs, and further demonstrate relevance for GLI3 and these top SNPs within the human eye, nding expression in both human donor neurosensory retina as well as retinal pigment epithelium (RPE).

iROP GWAS Multi-ethnic Cohort Characteristics:
The iROP cohort consists of 2187 preterm infants born at gestational ages less than 32 weeks and birth weights smaller than 1250 grams, placing them at risk to develop ROP. The iROP database was mined for infants with both biologic and phenotypic data for subsequent genotype-phenotype analysis. To identify genetic susceptibility regions for ROP, we conducted a genome-wide association study (GWAS) using the Illumina In nium Global Screening Array with information from these infants. As published previously, phenotype was rigorously determined by a team of ophthalmologists and image graders with clinical expertise in ROP 16,25,26 . This subset of iROP patients represents multiple ethnic and racial groups, including 44.5% Hispanic, 35.4% White, 12.1% African American and 8% unidenti ed race individuals as noted in Table 1. Further, all ROP phenotypes are represented, including 197 with ROP Stages 3, 4, or 5.
( Table 1). The distribution of ROP disease, ethnicity and race was similar between the infants with and without genetic data (Supplementary Table 2).
A number of SNPs demonstrate signi cant associations with severe ROP.
GWAS analysis was performed on DNA extracted from all 920 cohort infants to identify genetic susceptibility regions for ROP severity, de ned as ROP stage 3 or greater. As depicted in Figure 1, which visualizes all p values of SNP-associations for the 22 autosomal chromosomes plotted relative to chromosome location and signi cance, a number of SNP associations were identi ed. The Y axis represents the -log transformed p value (red line representing GWAS signi cance level) and X axis is in genomic ordered by chromosomes and positions. SNPs reaching genome-wide signi cance, de ned as p≤ 5x10 -8 , are visualized above the red line.
GWAS identi es ROP-associated SNPs with genome-wide signi cance or suggestive signi cance correlating with both novel and established ROP risk genes As noted in Table 2, we identi ed 10 loci which demonstrate genome-wide signi cance (p≤5×10 -8 ) or suggestive signi cance (p≤5×10 -6 ) of association with ROP severity, de ned by stage 3 disease or greater, both with and without correction for birth weight (BW) and gestational age (GA). The position, effect allele and frequency, odds ratio, p value and nearest gene is delineated in Table 2. SNPs demonstrating the lowest p value in each locus/region were selected and the bioinformatically determined nearest gene was used. Under the additive genetic model, the ROP risk corresponds to the number of copies of the effect allele when the odds ratio is greater than 1. As noted, SNPs rs2058019 (GLI3 gene) and rs11563856 (CLDN12 gene) both reach genome wide signi cance, although only rs2058019 remains signi cant at this level when controlling for the potentially confounding effects of BW and GA. In two of the loci meeting suggestive signi cance, the nearest gene has a prior association with ROP disease in either humans or animal models as noted by bolded text. Finally, we performed an in silico analysis to determine if each listed SNP had a described expression quantitative trait loci (eQTL) association in the EyeGEx 27 database. We identi ed a number of established regulatory associations for our top SNPs as detailed in Table 2 within the retina, further supporting relevance to the ocular microenvironment. rs2058019 contributes to genetic susceptibility for ROP in diverse populations Given established differences in clinical ROP risk based on race and ethnicity 22,23 , we analyzed the association of the lead GLI3 SNP, rs2058019, strati ed by these variables. As seen in Table 3, the association between rs2058019 and ROP disease is most signi cant when considering all patients, although odds ratios and allele frequencies between cases and controls demonstrate that the degree and direction of association remain consistent for Caucasian and Hispanic subjects. Interestingly, the African American subjects do not appear to be contributing to this association. This latter nding does not reach signi cance in this sample and thus requires validation within a larger population. As noted in Supplemental Table 3, we did not nd suggestive evidence for signi cant differences is MAF or direction of effect within racial or ethnic populations for other top SNPs.
Multiple signi cant SNPs are identi ed regionally related to GLI3 To determine the regional signi cance for the GLI3 pan-locus, we plotted all SNP-associations within 100 Kb of the GLI3 gene for all patients as well as for racial and ethnic subsets. As pictured in Figure 2, a regional plot with the Y axis depicting the -log transformed p value and the X axis showing chromosomal position, we illustrate the GLI3 gene regional association for all ethnic and racial groups with ROP severity. The lead SNP rs2058019 was identi ed as a top-associated SNP within the pan-locus for the Caucasian and Hispanic groups and both the best and sentinel SNP when all patients were considered together. Signi cant regional SNPs for each group are delineated fully in Supplemental Table 4. Also pictured, we determined the linkage disequilibrium (LD) for each SNP, de ned by the r 2 using the included color scheme. LD was determined between SNPs and genes involved/close to SNPs identi ed.
GWAS identi es ROP-associated SNPs with genome-wide signi cance or suggestive signi cance correlating with both novel and established ROP risk genes As noted in Table 2, we identi ed 10 loci which demonstrate genome-wide signi cance (p≤5×10 -8 ) or suggestive signi cance (p≤5×10 -6 ) of association with ROP severity, de ned by stage 3 disease or greater, both with and without correction for birth weight (BW) and gestational age (GA). The position, effect allele and frequency, odds ratio, p value and nearest gene is delineated in Table 2. SNPs demonstrating the lowest p value in each locus/region were selected and the bioinformatically determined nearest gene was used. Under the additive genetic model, the ROP risk corresponds to the number of copies of the effect allele when the odds ratio is greater than 1. As noted, SNPs rs2058019 (GLI3 gene) and rs11563856 (CLDN12 gene) both reach genome wide signi cance, although only rs2058019 remains signi cant at this level when controlling for the potentially confounding effects of BW and GA. In two of the loci meeting suggestive signi cance, the nearest gene has a prior association with ROP disease in either humans or animal models as noted by bolded text. Finally, we performed an in silico analysis to determine if each listed SNP had a described expression quantitative trait loci (eQTL) association in the EyeGEx 27 database. We identi ed a number of established regulatory associations for our top SNPs as detailed in Table 2 within the retina, further supporting relevance to the ocular microenvironment. rs2058019 contributes to genetic susceptibility for ROP in diverse populations Given established differences in clinical ROP risk based on race and ethnicity 22,23 , we analyzed the association of the lead GLI3 SNP, rs2058019, strati ed by these variables. As seen in Table 3, the association between rs2058019 and ROP disease is most signi cant when considering all patients, although odds ratios and allele frequencies between cases and controls demonstrate that the degree and direction of association remain consistent for Caucasian and Hispanic subjects. Interestingly, the African American subjects do not appear to be contributing to this association. This latter nding does not reach signi cance in this sample and thus requires validation within a larger population. As noted in Supplemental Table 3, we did not nd suggestive evidence for signi cant differences is MAF or direction of effect within racial or ethnic populations for other top SNPs.
Multiple signi cant SNPs are identi ed regionally related to GLI3 To determine the regional signi cance for the GLI3 pan-locus, we plotted all SNP-associations within 100 Kb of the GLI3 gene for all patients as well as for racial and ethnic subsets. As pictured in Figure 2, a regional plot with the Y axis depicting the -log transformed p value and the X axis showing chromosomal position, we illustrate the GLI3 gene regional association for all ethnic and racial groups with ROP severity. The lead SNP rs2058019 was identi ed as a top-associated SNP within the pan-locus for the Caucasian and Hispanic groups and both the best and sentinel SNP when all patients were considered together. Signi cant regional SNPs for each group are delineated fully in Supplemental Table 4. Also pictured, we determined the linkage disequilibrium (LD) for each SNP, de ned by the r 2 using the included color scheme. LD was determined between SNPs and genes involved/close to SNPs identi ed.

Top-identi ed SNP associations vary by race and ethnicity
To further determine race and ethnicity-based differences in SNP associations with ROP severity, we performed our GWAS analysis separately for each racial and ethnic group. As pictured in Supplemental Figure 2, which visually represents SNP associations for each population, racial and ethnic differences are present. As seen in Table 4, we nd several SNP associations which are racially or ethnically speci c and reach genome-wide or suggestive level signi cance. Notably, rs9978278 (CLDN14 gene) and rs74048122 reach genome-wide signi cance, which remains signi cant after correction for BW and GA. Although the number of observations is decreased when analyzing by race and ethnicity, these ndings suggest possible differences in genetic risk for ROP by population ancestry.
SNPs with suggestive or greater association with ROP severity in the full iROP cohort demonstrate crosssigni cance within the Hispanic Diabetic Retinopathy GOLDR cohort Our study design, which identi es SNPs associated with ROP disease stage 3 or greater, enriches our ndings for genetic variation associated with pre-retinal neovascular pathobiology. Diabetic retinopathy (DR) is another form of retinal disease characterized by pre-retinal neovascularization. Therefore, to validate the signi cance of GLI3 genetic variation to pre-retinal neovascular disease, we performed an extension analysis of top-associated iROP SNPs, those with p ≤5×10 -6 or greater signi cance levels, and SNPs regionally related for DR association within the Genetics of Latino Diabetic Retinopathy (GOLDR) GWAS dataset. Importantly, the GOLDR population consists of Hispanic patients with diabetic retinopathy (DR) 28, 29 . As noted in Table 5, we found cross-signi cance for a number of SNPs with DR in GOLDR and with ROP severity in our iROP GWAS cohort. This included SNPs within GLI3, DCLK1, SP4, PTPRD,RPL30 and RIDA.
ROP Genetic Risk Score (GRS) demonstrates increased signi cance over single SNP associations Joint estimation of SNP effects has been suggested as a superior approach to improve SNP-based disease prediction and risk strati cation, including in eye disease by our group [30][31][32][33][34][35] . We calculated GRSs (Score A, Score B) incorporating SNPs based on the SNP selection by LASSO method to determine if combined SNP pro les were more highly associated with ROP severity 36 . ROP severity was represented as an ordinal categorial variable denoting stage including none, stage 1, stage 2, etc for this analysis.  Table 6, each GRS was signi cantly associated with ROP severity. To determine the proportion of variation within the phenotype which can be explained using each risk score, and therefore the breadth of ROP phenotype that is associated with each GRS, we analyzed adjusted R-squared. As noted, score A accounted for a larger adjusted R-squared than the top SNP alone (0.63 versus 0.34), suggesting that the additional SNPs in this risk score together account for a more variance in ROP phenotype/severity than the GLI3 SNP alone.  Table 7.
Extension of associations for severe ROP SNPs identi ed in a candidate gene study replicate signi cance of the RELN gene To independently replicate previously identi ed ROP-associated SNPs, we evaluated candidate SNPs found to be signi cantly associated with severe ROP by Harnett et al. 17 within our iROP GWAS dataset. As noted in Table 8, rs10251365 in the gene RELN is signi cantly associated with ROP severity in both cohorts (p=0.009). The p values of SNP-associations with ROP severity based on stage using our data are included for comparisons and con rmation.
GLI3 and other top associated genes are expressed in human retinal and retinal pigment epithelial tissues (RPE).
To determine the relevance of GLI3 genetic variation to retinal pathophysiology, we sought to determine the expression patterns of GLI3 within human retinal and RPE tissues. Human donor macular or peripheral retinal and RPE tissues from control eyes (n=10), average age 72 years, were analyzed using RNA-sequencing. Expression was analyzed using DESeq2 38 as we have previously published 39 . Analyses were done for genes relative to our top-associated ROP SNPs as listed in Table 2. All top genes except PRPF4B were expressed in human donor macular tissues; the tissue type with greatest expression and fold difference between tissue types (neurosensory retina or RPE) is listed in Table 9. As noted, GLI3 was expressed in both the neurosensory retina and RPE; macular RPE expression was signi cantly greater by 2.54-fold (p=3.3 x 10 -15, after correction for multiple testing using Benjamini-Hockberg) than neurosensory retinal expression (Table 9). We also analyzed differential GLI3 expression between macular and peripheral RPE tissues to determine the region with greatest expression. Interestingly, we found that GLI3 expression was 1.72 fold higher in the peripheral RPE as compared to the macular RPE (p<0.01) which is notable given the presence of ROP most commonly within the peripheral retina.

Discussion
Herein, we report the rst successful GWAS for ROP which identi es a novel ROP risk association for the GLI3 SNP rs2058019, reaching genome-wide signi cance within a multiethnic population. This is the rst reported genome-wide signi cant SNP association for ROP, which shows an association trend in the same direction for both Hispanic and Caucasian infants and therefore, may demonstrate disease applicability across populations. We further identify signi cance of GLI3 variation for preretinal neovascular disease and established ROP genetic risk associations using extension analysis within an independent replication cohort consisting of Hispanic individuals with diabetic retinopathy and severe ROP candidate dataset 17 respectively. Finally, we demonstrate relevance of our ndings to human retinal and RPE tissues using expression pro ling within donor eyes. Owing to the multiethnic composition of our cohort, our analysis also is the rst to demonstrate racial and ethnic differences in GWAS-determined SNP associations. While our lead SNP, rs2058019, demonstrates a less signi cant association within the African American populations, as noted in Table 4, we identify several SNPs with signi cant association within only the African American population compared with the Hispanic or non-Hispanic Caucasian populations. This is consistent with, and indeed may speak to, observed clinical differences in ROP incidence and severity, including greater disease development and severity in Caucasian populations 22,23 . However, while suggestive of disparate genetic risk for ROP disease within different racial and ethnic populations, these results require replication in a larger population to clarify ethnic and racial differences.
GLI3 has not previously been associated with ROP disease, though it has a de ned role within ocular development. Speci cally, the dual role for Gli3 as both a transcriptional activator and repressor of canonical sonic hedgehog (Shh) signaling, has been found to control differentiation of the RPE and rod photoreceptor layer during Xenopus as well as Medaka sh eye development 40,41 . Further, abnormal expression of the Hedgehog signaling pathway has been shown in ROP within the oxygen induced retinopathy (OIR) murine model 42 . More broadly Gli3 has been shown to regulate both the innate and adaptive immune response, with described roles in fetal CD4 and CD8 thymocyte and T-cell development 43,44 . Certainly, pre-retinal neovascular disease, including within ROP and diabetic retinopathy, has been associated with aberrant in ammation [45][46][47] . Thus, there is a feasible relationship between GLI3 and ROP disease incidence and severity.
The lead SNP, rs2058019, is present within an intronic region and has not previously been associated with disease. Thus, the functional signi cance is not clear, although aberrancy in GLI3 is associated with human disease, including polydactly syndromes 48 and malignancy [49][50][51][52] . Functionally, both aberrant GLI3 repressor and activator roles have been associated with underlying patho-mechanisms, resulting in imbalance within epithelial to mesenchymal transition 50,52,53 , AKT/ERK1 activation 54 , p53 function 55 and autophagy 56 . Signi cant differences in GLI3 expression between control and diseased tissues is also thought to underlie pathobiology as described in AML 57 and colon cancer 55 , in some cases regulated by differential methylation 53,57 . Thus, there is signi cant precedent for aberrancy of GLI3 DNA structure, expression, or function in human disease. Future work will seek to determine the functional signi cance of this SNP association to ROP disease development in preterm infants. Our work demonstrating GLI3 expression within adult human donor eyes within both neurosensory retina and RPE/choroid tissues with a higher gradient within the RPE/choroid, supports the premise that GLI3 has relevance to human preretinal neovascular disease.
In addition to GLI3, genes associated with top identi ed SNPs within our dataset, as demonstrated in Table 2, have shown associations with ROP in other work. ARHGEF7, demonstrates signi cant differential expression within rodent models of pre-retinal neovascularization and is also different between early and late stages of the disease within the OIR model of ROP 58 . There is also precedent for DPP4 function within retinal vascular homeostasis which is perturbed in murine models of both ROP and diabetic retinopathy. Within the OIR model, DPP4-inhibition increased retinal vascularity and leakage while in the murine diabetic retinopathy model, DPP4-inhibition increased retinal vascular leakage 59 . These murine ndings are substantiated in human literature, which demonstrates increased progression of proliferative diabetic retinopathy in patients taking DPP4-inhibitors 60 . The inverse was also found to be true, namely that topical administration of dipeptidyl peptidase prevents vascular leakage in a diabetic mouse model 61 . Finally, while CLDN12 has not been found to play a role in pre-retinal neovascular disease or ROP speci cally, the Claudin family of proteins have precedent in neovascular ocular disease 62,63 . This is particularly interesting given the identi cation of another Claudin family member, CLDN14, with a signi cant SNP association for ROP severity in African Americans (Table 4).
Genes associated with our top-identi ed SNPs also support ndings from case control candidateapproach SNP-chip studies. As noted in Table 8, extension of associations for the top 10 SNPs identi ed for severe ROP by Hartnett et al. 17 demonstrates independent replication for rs10251365 in the RELN gene (p=0.009). While additional cross-signi cance was not identi ed, the study design, GWAS versus candidate approach, and outcome measures, "severe compared to mild ROP" versus ROP severity by stage, differ between these studies. As noted above, we also identi ed the greatest association for these top SNPs within a multiethnic population which is distinct from the roughly 70% Black race population examined by Hartnett et al. Thus, the lack of additional overlap may also be informed by racial and ethnic differences between studied populations or may re ect other aspects of ROP risk, such as timing of ROP presentation as well as susceptibility of disease progression given external stresses that interact with race and ethnicity. As noted, we identify unique genetic ROP risk for African American versus Caucasian or Hispanic populations. While African American unique SNPs were identi ed in a smaller proportion of our cohort, and therefore based in fewer observations, these ndings warrant further examination within larger populations. Future meta-analysis may allow for better harmonization and additional independent validation. Taken together, our identi ed SNP associations demonstrate relevance to described ROP patho-mechanisms, while also identifying novel associations for further study.
The overall relevance of genetic variation to ROP risk has been debated 64 . Our work substantiates genetic ROP risk which is further highlighted by our genetic risk score assessment. This analysis methodology has been used in numerous disease contexts, including by our group 34,65 , to assess the magnitude of genetic risk for disease outcome measures. In our analysis, ROP disease severity was more signi cantly associated with polygenic SNP subsets than with the most signi cant individual GWAS SNPs. This suggests the existence of more extensive genetic variation contributing to ROP and thus, the overall importance of this continued approach.
Finally, the relevance of our data to vascular pathobiology more broadly is evidenced in our extension analyses within independent replication datasets from other forms of pre-retinal neovascular (DR/GOLDR) and retinal vascular (CVRE/CRAE) pathobiology. We identify SNP associations that "bridge" these disease contexts and thus may speak to co-occurring mechanisms of disease. Among these, SNP-associations at the GLI3, DCLK1, SP4, PTPRD,RPL30 and RIDA genes were replicated in an independent Hispanic diabetic retinopathy (DR) cohort and thus shared within the context of pre-retinal neovascular disease which is present in both ROP ≥ stage 3 and proliferative diabetic retinopathy. While DCLK1 66 and PTPRD 67 have been previously associated with DR in animal and human studies respectively, the remaining associations are novel within the context of pre-retinal neovascular disease and represent areas of future study. Similarly, our identi cation of SNPs within FYCO1 and ANK3 as shared between vascular pathobiology in the CVRE/CRAE dataset and our iROP cohort, represents novel associations. Both genes have associations within the eye and retina, though most signi cantly with anterior segment pathology including infantile cataract and coloboma 68,69 .
Taken together, we report the largest ROP GWAS to date, which identi es a novel locus at GLI3 on chromosome 7 demonstrating genome wide level signi cance, the rst ROP associated variant to do so. We further identify 7 additional loci with suggestive level signi cance for ROP severity. We demonstrate relevance for GLI3 and other top-associated genes to ndings from prior studies and human ocular disease through in-silico extension analyses, genetic risk score analysis and expression pro ling in human donor eye tissues. Signi cance varied by race, with greater association measured within Caucasian and Hispanic compared with African American infants. This may underlie clinically observed differences in ROP development which are also represented in our cohort which demonstrates African American ROP prevalence of 43% compared to 52% and 67% for Caucasian and Hispanic populations respectively. Replication of this study using a larger and more diverse population would allow for validation of these ndings, particularly with regard to differences in genetic ROP risk relative to ethnicity and race [70][71][72] . In summary, we report the largest ROP GWAS to date, identifying a novel locus at GLI3 with relevance to retinal biology supporting genetic susceptibilities for ROP risk with possible variability by race and ethnicity.

Methods
Recruited Study Cohort: The Imaging and Informatics for ROP (iROP) study is a multicenter, prospective, ROP cohort study collecting clinical, biologic, and imaging data from preterm infants born with GA and BW risk for ROP as previously published by our group 16,25 . For all enrolled infants, retinal images were obtained using a wide-angle fundus camera [RetCam; Natus Medical Incorporated, Pleasanton, CA]; biologic specimens included blood or saliva and was obtained with speci c consent. Informed consent was provided to the parent/guardian under approved IRB protocols at each study site which conformed to the tenets of the Declaration of Helsinki. When consent was given for biologic sample collection blood was collected in a purple top microtainer tube. White blood cells within the buffy coat were isolated using a standard protocol and DNA extracted using a Qiagen Allprep system for subsequent genetic analysis. All images were deidenti ed in accordance with HIPPA privacy rules. ROP phenotype was determined for each eye by consensus from at least 3 trained graders with ROP expertise employing image-based diagnoses and the clinical exam diagnosis at each study center as previously described 26 . ROP severity was graded based on the worst stage present for either eye. The iROP database was reviewed to identify infants with both biologic and phenotypic data availability which included 920 infants. Patients without detailed demographic, ROP screening, clinical, or imaging data were excluded.

Genotype, quality control and imputation
Genotyping was performed using genotyping arrays from Illumina In nium Global Screening Array

Genome-wide association tests for single SNPs
Under the additive genetic model, we performed GWAS tests with allele dosage for all genotyped and imputed SNPs strati ed by each ethnic group, and also combined, using E cient Mixed Model Association eXpedited (EMMAX) method 76 implemented in EPACTS (https://genome.sph.umich.edu/wiki/EPACTS). Outcome measure for association was ROP severity de ned by stage 3 or higher. Birth weight, gestational age, gender, top 3 PCs, and the relatedness by the genomic relationship matrix estimated using the genomic data, were included as covariates. Associations with p < 5x10 -8 were considered GWAS signi cant. To further investigate the relationship between birth weight/gestational age and ROP association signals, we also performed the association tests with and without adjustment for birth weight/gestational age. In addition, GWAS analyses strati ed by each ethnic or racial group were also performed in the same way.
Retinal expression quantitative trait loci (eQTL) Analysis: To investigate the relationship between association signals and gene expression, we evaluated whether top ROP SNPs identi ed from GWAS are related to eQTL in the genotype-tissue speci c expression databases, EyeGEx 27 , by far the largest reference tissue bank for gene expression and regulation in the human retina.
Genetic risk score (GRS) of ROP: Three genetic risk scores were tested. For the "GLI3" risk score was composed of the genotype for the most signi cant SNP, rs2058019. In order to reduce over tting of the genetic risk score to ROP stage due to the large number of SNPs relative to the number of subjects, we t a linear regression model with a LASSO penalty using the R package "glmnet" with 451,565 directly    Table 7 summarizes SNPs (identi ed for CRVE/CRAE from the literature) con rmed to be signi cantly associated with ROP Stage. na: SNP not available in ROP dataset and/or with MAF less than 0.05. Differential expression between macular retinal and RPE/Choroid was calculated and p-values corrected for multiple testing using Benjamini-Hockberg. The absolute fold change is represented between the tissue with higher expression versus the tissue with lower expression. Figure 1 Manhattan plot of genome-wide association with ROP severity (stage 3 or greater):

Figures
Data were analyzed for all populations combined. The red line on indicates genome-wide signi cance (p≤5x10 -8 )