We used data on COVID-19 vaccination rates per 10,000 residents (at least 1 dose) at the zip code tabulation area (ZCTA) level (henceforth, neighborhoods) publicly available from the Philadelphia Department of Public Health on three dates: March 18th, April 18th and May 18th, 2021. For reference, Philadelphia opened eligibility to all adults on April 16th, 2021. We used neighborhood-level data publicly available from the 2014–2018 American Community Survey for sociodemographic variables - % with college education, % uninsured, % households with limited English proficiency, % working in service jobs, % using public transportation, and % overcrowded households [> 1 person per room]. Based on Krieger (31), we also computed the Index of Concentration at the Extremes for Black non-Hispanic populations and for income, also at the neighborhood level.
Calibration
We performed multi-value calibration on all factors and the outcome COVID-19 vaccination rates. Vaccination rates were calibrated based on Philadelphia Department of Public Health (PDPH) reporting categories for percent of residents with at least one COVID-19 vaccination. For the three dates March 18, April 18 and May 18, we calibrated neighborhood level COVID-19 vaccination rates based on the tertile categories used by the PDPH to publicly report neighborhood rates per 10,000 residents. For March: <1500 = Low, 1501–2000 = Medium, > 2000 = High; for April: < 2500 = Low, 2501–3400 = Medium, > 3400 = High; and for May: <3400 = Low, 3401–4500 = Medium, > 4500 = High. Calibration of social conditions was based on tertiles where values greater than or equal to 67th percentile were categorized as high, values greater than 33rd percentile and less than 67th percentile were categorized as medium, and values less than or equal to 33rd percentile were categorized as low (% with college education, % uninsured, % households with limited English proficiency, racial segregation and economic inequity, % using public transportation, % overcrowded households). Because our analysis focused on those neighborhoods with persistently low COVID-19 vaccination rates over the three-month period, our main analysis only includes neighborhoods that had a low value in all three months, resulting in a total of n = 13 neighborhoods out of a total of 43 in the analysis.
Income Inequities: Index of Concentrations at the Extremes-Income:
To proxy income inequities, we used the Index of Concentration at the Extremes-Income (ICE-Income) measure where negative one (-1) represents the least concentrated economic privilege (extreme concentration of low-income residents) and positive one (1) represents the most concentrated economic privilege (extreme concentration of high-income residents.) We calibrated ICE-Income based on tertiles where values from − 1 to -0.23 were coded as zero (low concentrated economic privilege), values from >-0.23 to < 0.07 were coded as one (medium concentrated economic privilege), and values from 0.07 to 1 were coded as two (high concentrated economic privilege).
Racial Segregation: Index of Concentrations at the Extremes-Non-Hispanic Black:
To proxy racial segregation, we used the Index of Concentration at the Extremes-Non-Hispanic Black (ICE-BlackNH). ICE-BlackNH values range from negative one (-1) representing the least concentrated racial privilege (extreme concentration of Black non-Hispanic residents) to positive one (1) representing the most concentrated racial privilege (extreme concentration of White non-Hispanic residents). We calibrated ICE-BlackNH based on tertiles where values from − 1 to -0.37 were coded as zero (low concentrated racial privilege), values from >-0.37 to < 0.47 were coded as one (medium concentrated racial privilege), and values from 0.47 to 1 were coded as two (high concentrated racial privilege).
Health Insurance:
Percentage of residents without health insurance (range 2.7–13.8%) was calibrated based on tertiles where values from 2.7–6.5% were coded as zero (low uninsured), values > 6.5%-9.7% were coded as one (medium uninsured), and values > 9.7% were coded as two (high uninsured).
Education:
Percentage of residents with college education ranged from 4.5–85.6%. We calibrated this condition based on tertiles where values ≤ 20% were coded as zero (low college education), values > 20% and 40.1% were coded as one (medium college education), and values > 40.1% were coded as two (high college education).
Limited English Proficiency:
Percentage of households with limited English proficiency ranged from 0.4–25.1%. We calibrated limited English proficiency based on tertiles where values ≤ 2.2% were coded as zero (low limited English proficiency), values between 2.2% and 4.7% were coded as one (medium limited English proficiency), and values > 4.7% were coded as two (high limited English proficiency).
Public Transportation Use:
Percentages of residents using public transportation by neighborhood ranged from 9.2–45.3%. We calibrated public transportation use based on tertiles where values ≤ 21.4% were coded as zero (low public transportation use), values between 21.4% and 29.5% were coded as one (medium public transportation use), and values > 29.5% were coded as two (high public transportation use).
Service Employment:
The percentage of residents working in service employment ranged from 1.9–16.9%. We calibrated service employment based on tertiles where values ≤ 7.9% were coded as zero (low service employment), values from 7.9–9.8% were coded as one (medium service employment), and values greater than 9.8% were coded as two (high service employment).
Overcrowding:
Percentages of overcrowding ranged from 0–6.1%. We calibrated overcrowding based on tertiles where values ≤ 1.7% were coded as zero (low overcrowding), values from 1.7–2.4% were coded as one (medium overcrowding), and values > 2.4% were coded as 2 (high overcrowding).
Meta-factor Calibrations:
We also created meta-factors using dual calibrations of each factor to identify patterns among potential difference makers among the conditions. For example, for education, we calibrated a factor called high education where neighborhoods in the highest tertile of education were coded as one, and those not in the highest tertile (low to medium) were coded as zero. We also created a factor called low education where neighborhoods in the lowest tertile of education were coded as one and those not in the lowest tertile (medium to high) were coded as zero.
Factor Selection:
To reduce our data and focus our analysis, we implemented a configurational approach to factor selection described in detail elsewhere (26–30). Briefly, we began by using the “minimally sufficient conditions” (i.e., msc function within the R package “cna”) to look across all 43 neighborhoods and all 8 factors at once, comprehensively scanning the entire dataset to identify specific configurations of conditions with strong connections to the outcome of interest (i.e., persistently low COVID-19 vaccination rates). This process exhaustively considered all one-, two- and three-condition configurations instantiated in the dataset, assessed each configuration against a prespecified consistency threshold, retained all configurations that satisfied this criterion, and then generated a “condition table” to list and organize the Boolean output. In a condition table, rows contain all configurations of conditions that meet a specified consistency level while column variables include outcome, conditions, consistency and coverage. We generated the msc routine condition tables by specifying a consistency threshold of 100%; if no configurations met this threshold, we iteratively lowered the specified consistency level by 5 points (e.g., from 100–95%, etc.) and repeated the process to generate a new condition table. We continued lowering the consistency threshold until there were at least two potential configurations of neighborhood-level conditions that met the specified consistency level. Using this approach, we inductively analyzed the entire dataset and used the condition table output to identify a subset of candidate factors to use in model development in the next steps of the configurational analysis.
Model Development:
We next developed models by iteratively using model-building functions within the R “cna” software package. We assessed candidate models based on their overall consistency and coverage, as well as potential model ambiguity (when competing models satisfy the specified consistency and coverage thresholds and explain the outcome of interest equally well, as reflected by similar consistency and coverage scores). We selected a final model based on the same criteria of overall consistency and coverage, with no model ambiguity. The Coincidence Analysis package (“cna”) in R (32), R (version 3.5.0), and Microsoft Excel were used to support the analyses. Maps were created with ArcGISPro 2.9.1 (ESRI, Redlands CA) using 2010 ZIP Code Tabulation Area (ZCTA) boundaries from the US Census.