Data and participants
The Consortium on Safe Labor (CSL) was a national, electronic medical record-based retrospective cohort study from 2002-2008 which included 19 hospitals (8 university teaching hospitals, 9 community teaching hospitals, 2 community hospitals) in 15 Hospital Referral Regions (HRR), catchment areas for tertiary care hospitals.(54) Hospitals were selected based on availability of electronic medical records, and for representation of the 9 American College of Obstetricians and Gynecologists districts.(55) Data were extracted for deliveries ≥ 23 weeks gestation and include maternal sociodemographic characteristics; medical, reproductive and prenatal history; labor and delivery, and newborn data. A total of 228,438 deliveries were included in the CSL. We excluded multifetal pregnancies (n= 5,053; 2.21%), mothers with pre-existing diabetes (n = 3,309; 1.44%), and those with missing air pollution exposure information (n=10; .004%). Including only API mothers resulted in an analytic sample of 9,069 births to 8,350 mothers. Institutional Review Boards at all sites approved the CSL, and data are anonymous.
Outcome variable
GDM was drawn from medical record data or in discharge summaries using ICD-9 code 648.8. During the CSL study period (2002-2008), the American Diabetes Associations recommended screening for GDM between 24-28 weeks gestation using the Carpenter and Coustan criteria.(56)
Ethnic enclave exposure
In the CSL, area of residence was estimated using the HRR in which the birth occurred. HRR is the only geographic unit of analysis available in the CSL.(57) HRRs are regional geographies (average miles2: 13,065) comparable to Metropolitan Statistical Areas,(58) with large enough populations (average population size in thousands: 2,026) for observable residential sorting.(54,58)
We aggregated sociodemographic data at the zip code tabulation area (ZCTA) level to provide estimates at the HRR level. As HRR are partially defined by ZCTA, we aggregated ZCTA data to the corresponding HRR using year-specific ZCTA to HRR crosswalk from the Dartmouth Atlas of Health Care. (54,58) ZCTA data was accessed from the National Historical Geographic Information System for the 2000 decennial census, and the 2007-2011 5-year average of the American Community Survey (ACS).(59) We linked CSL data with year-specific sociodemographic data: births between 2002-2004 were linked with 2000 Census data, and births between 2005-2008 were linked with 2007-2011 ACS data.(11,58)
We identified ethnic enclaves at the HRR level.(11) HRRs are centered on urban areas, where the majority of U.S. API populations reside,(2) yet the regional coverage of HRRs allows for inclusion of potential ethnic enclaves outside of urban centers.(60)
Described in Table 1, the distinct social and geographic attributes of an ethnic enclave are represented by API population density and racial/ethnic segregation, defined using three variables.(5,11) First, API population density, is measured by the percent of API individuals residing in an HRR. Second, API-White dissimilarity index, is the differential distribution of API and White populations within a geographic area,(61,62). Lastly, the API isolation index, is the probability that an API individual will interact with another API individual.(61,62) API population density, API-white dissimilarity index, and API isolation index were calculated separately for Census data and ACS data.
We used population-based percentiles(4,5,11,18) to identify tertiles (low, medium, high) for API population density, API-white dissimilarity, and API isolation. An HRR was considered an ethnic enclave if it was in the upper third of the distribution for all three variables: API population density, API-white dissimilarity, and API isolation.(11)
Ambient volatile organic compound exposure
The Air Quality and Reproductive Health study estimated air pollution exposure in the CSL using a modified version of the Community Multiscale Air Quality Model (version 4.7.1), a 3-dimensional, multipollutant air quality model used to predict ambient pollutant levels using 2005 (version 4)National Emission Inventory emissions data and Weather Research Forecasting Model meteorological data. Modified CMAQ models were evaluated at 4km and 36km, and we used 36km as the HRR resolution was minimally impacted.(57) Exposure was based on predicted hourly ambient pollutant concentrations within HRRs, fused with local air monitoring data to improve accuracy, and weighted to reflect population concentration and non-residential areas (i.e. industrial, large parks, water, mountains), as previously described.(57)

As GDM screening is recommended between 24-28 weeks gestation,(56) we averaged the predicted hourly ambient pollutant concentration across preconception (3 months preconception) and first trimester (through 13 weeks gestation) exposure windows. Ambient concentrations (parts per billion; ppb) were estimated for 14 VOCs: benzene, 1,3-butadiene, ethylbenzene, cyclohexane, methyl-tertiary-butyl ether, N-hexane, ethyl-methyl ketone, m-xylene, o-xylene, p-xylene, propene, sesquiterpene, styrene, and toluene for each exposure window. Exposure to ≥75th percentile in ppb was considered high exposure, and all values <75th percentile in ppb were considered low exposure.
Joint exposure categories
Using the categorical ethnic enclave (yes/no) variable, and the categorical VOC (high/low) variable, we created joint exposure categories: Low VOC/Enclave (reference), Low VOC/No Enclave, High VOC/Enclave, High VOC/No Enclave. The joint exposure variables were created for each of the 14 VOC in both the preconception and first trimester exposure windows.
Covariates
Individual-level covariates included maternal age, marital status (married, single, other), health insurance (public, private, other), pre-pregnancy body mass index (BMI, <18.5, 18.5-<224.9, 25-<29.9, ≥30), season of conception (winter, spring, summer, fall) and parity (nulliparous or multiparous). As income is not available in the CSL, health insurance(63) and marital status(64) were used as proxies for socioeconomic status. BMI was imputed using multiple imputations (10 iterations) due to a high degree of missingness (42%).
Area-level poverty (continuous proportion of residents in the HRR living below federal poverty thresholds), hospital type (university teaching hospital, community teaching hospital, and community non-teaching hospital) were included as HRR-level covariates. Covariates included in analysis were informed by previous studies.(11,12)
Statistical analysis
Prevalence of GDM was reported for ethnic enclave residence and maternal characteristics, and by joint enclave-VOC exposure. Spearman rank correlations between each of the VOCs were calculated (Supplemental Tables 1 and 2).
Mothers in CSL were nested in HRRs for analysis. Hierarchical logistic regression models were used to estimate the odds ratio (OR) and 95% confidence intervals for the association between joint VOC/ Enclave exposure and GDM, with robust standard errors to account for repeat births to the same mother (n= 731, 7.9% of births). Low VOC/Enclave exposure category served as reference group as we anticipated this is the lowest risk category. Separate models were run for each of the 14 VOCs for the preconception and first trimester exposure windows, using PROC GLIMMIX and PROC MIANALYZE (SAS 9.4) (65). Benjamini-Hochberg false discovery rate adjustment procedure was used to account for multiple testing (66) (false discovery rate = 10%). Analyses were performed using PROC MULTTEST (SAS 9.4) (65).
Sensitivity analysis
To further disentangle the potential effects of individual component measures (API population density, dissimilarity index or isolation index), we fit separate models to examine the association of ethnic enclaves, and each component part alone, with GDM. The ethnic enclave variable was dichotomous (yes/no), with no serving as the reference category. The component variables were the tertile (low/medium/high) variables used to identify ethnic enclaves, with the low category serving as the reference. Covariates included maternal age, marital status, health insurance, BMI, season of conception, parity, area-level poverty, hospital type, preconception benzene, and first trimester benzene.