Study Population, Study Setting and Clinical Outcome. The San Francisco Health Network (SFHN), including Zuckerberg San Francisco General Hospital, is an integrated safety-net healthcare system serving publicly insured and underinsured patients. We used SFHN EHR data from Jan 1, 2013 to Dec 31, 2017 to identify our study population and obtain individual-level patient characteristics including sociodemographic (e.g., race/ethnicity, insurance type) and clinical information (e.g., diabetes control). Our study population included SFHN patients who had an outpatient visit 2016-2017 and at least one additional outpatient visit within the prior two years, an ICD-9-CM or ICD-10-CM diagnosis of diabetes, at least one HbA1c lab result subsequent to diagnosis, and a residential address in San Francisco.[28]
We defined patients as having poor diabetes control if they had a glycosylated hemoglobin (HbA1c) level greater than 9% at their most recent lab test during the study period. We only considered HbA1c values after a recorded diabetes diagnosis to ensure the lab result was capturing control of an active diabetes diagnosis instead of a lab result potentially leading to a diagnosis, prior to clinical treatment.
Ethics and consent to participate. This study was approved by the University of California, San Francisco Institutional Review Board. IRB approval allows for use of clinical patient health data for analysis. Human subjects were not involved in this study and therefore written or verbal informed consent was not required
Neighborhood characteristics. Structural and social determinants of health across several domains (racial/ethnic and language composition, socioeconomic context including poverty and unemployment, food environment and access, housing) were compared across patient cluster groupings. Data for neighborhood-level characteristics were downloaded from the UCSF Health Atlas, an interactive map with a catalog of characteristics that illustrate social, economic, and built environments in California.[29] Specific data sources for each characteristic are described below.
1. Racial/ethnic and language composition of neighborhoods.[3] At the census tract level, we extracted percent White, percent Black, percent Asian, percent Native Hawaiian or Pacific Islander, percent Native American (alone or in combination with other races), and percent Latinx residents, sourced from 2013-2017 American Community Survey (ACS) data.[30] We also measured limited English proficiency, defined as the percent of the population at the census tract level that speaks English less than "very well," and percent of the population that speaks a language other than English at home, also sourced from ACS.[30]
2. Socioeconomic Context and Neighborhood Built Environment. To measure several additional indicators of structural and social determinants of health at the neighborhood level, we examined:
a. Poverty and unemployment. Percent poverty from ACS data was defined as the percent of the population with income below 100% of the federal poverty level in the past 12 months,[30] which we included along with the Housing and Urban Development’s Extremely Low-Income (ELI) measure, defined as below 30% of the area median income (relevant for high-income locations, such as San Francisco).[31] We also included percent unemployment from ACS.[30]
b. Socioeconomic Indices. We used a composite measure of the Healthy Places Index (combining economic, education, housing, healthcare access, neighborhood, clean environment, transportation, and social factors), where a higher percentile indicates less healthy neighborhood conditions.[32]
c. Food environment and access. Percent low-income, low-food access tracts was obtained from the US Department of Agriculture Food Access Research Atlas,[33] defined as low-income tracts where at least 500 people or 1/3 of the population lives more than half a mile away from the nearest supermarket. Percent of the population with Supplemental Nutrition Assistance Program (SNAP) benefits in the past 12 months was sourced from ACS.[30] Finally, food insecurity measures were census-tract level modeled estimates of percent food insecurity sourced from Feeding America.[29]
d. Housing. Housing data was sourced from HUD Comprehensive Housing Affordability Strategy Data.[31] Renter-occupied households were defined as the percent of housing units within a census tract that are lived in by a renter. Severe rent burden is defined as the percentage of renter-occupied households in a census tract for whom housing costs are over 50% of household income.
Statistical Analyses. We conducted descriptive analyses of patient characteristics by uncontrolled diabetes, overall and by sociodemographic characteristics.
Geospatial Analyses. We geocoded residential addresses of patients in our study population, using patients’ most recently recorded address in the EHR as of June 13, 2019. We used ArcGIS Pro Version 2.6 (Environmental Systems Research Institute, Inc., Redlands, CA, USA) for all geospatial analysis.
Census Tract Prevalence. We calculated the prevalence of poor glycemic control among diabetic patients in our study population by census tract and categorized census tracts into tertiles of high (between 18% - 47.1%), medium (11.9% - 17.9%), or low prevalence (less than 11.8%). Rates of small case counts are less reliable and therefore census tracts with fewer than 10 patients were excluded in rate calculation.
Hot Spot Analysis. We then conducted a hot spot analysis to identify hot and cold spots of poor diabetes control in San Francisco. We used the Getis-Ord Gi* statistic to assess randomness of the spatial distribution of high (poor glycemic control) and low (good glycemic control) values using the “Hot Spot Analysis (Getis-Ord Gi*)” tool in ArcGIS Pro 2.6.
The tool defines a “neighborhood” for each patient as the set of patients within a fixed distance band. The fixed distance band is determined using an incremental spatial autocorrelation test to assess the likelihood that spatial distribution patterns of high and low values are random. Using the Global Moran’s I statistic and z-scores generated from the test for a range of fixed distances, we identified distance bands with the greatest likelihood of having a non-random spatial distribution in order to identify the distance band at which spatial patterns of diabetes control are most likely to be clustered. For the incremental autocorrelation test, we used the Euclidean distance method to examine 15 distance bands each 20 meters (0.03 miles) apart over a range of 500-800 meters (0.31 - 0.49 mi). We identified the distance band at 620 meters or 0.385 miles as having the maximum spatial autocorrelation with a z-score of 5.83 and p-value <0.001.
The Hot Spot Analysis (Getis-Ord Gi*) tool compares the prevalence expected value of diabetes control for all patients in the study population with the prevalence value of diabetes control within a patient’s “neighborhood” and calculates a z-score and p-value for each patient. A patient is classified as a hot spot if there are statistically significantly more high values in the patient’s “neighborhood” than in the full study area (San Francisco). A patient is classified as a cold spot if there are statistically significantly more low values in the patient’s “neighborhood” than in the full study area. A high z-score indicated clustering of higher levels of poor diabetes control and a low negative z-score indicated clustering of lower levels of poor diabetes control, where the higher or lower the z-score indicates the intensity of clustering. We categorized the z-score into 3 categories—hot spots, cold spots, and not statistically significant—based on a 90% confidence level.
To mask point locations of patient residences and protect patient privacy, we used inverse distance weighting interpolation to visualize geographic areas of hot and cold spots.
Associations between patient diabetes control clustering and census tract characteristics. Finally, we summarized structural and social determinants of health indicators by patient cluster groupings (hot spots, cold spots, and not significant) to compare values and describe observed differences across these cluster groups. We assigned the census tract values for all characteristics to each patient and then averaged the census tract values of each characteristic for patients within each cluster classification (hot spot, cold spot, and not significant).