2.1 Study area
We analyzed the Twin Cities Region of Minnesota (abbreviated as Twin Cities Region), an area of nearly three million people living in 186 communities across the seven counties of Anoka, Carver, Dakota, Hennepin, Ramsey, Scott, and Washington. The Twin Cities Region has developed several distinctive types of neighborhoods (e.g., active downtown, vibrant urban) [15]. In addition, from 1985-2010, the neighborhood environment in the Twin Cities Region became increasingly diverse in social composition and physical form [15]. Therefore, we expected that the Twin Cities Region would be an ideal case in which to observe temporal differences of, and changes in, the distribution of neighborhood food resources. Our study area included 2,083 census block groups defined in 2010 by the U.S. Census Bureau in the Twin Cities Region with diverse built environment and sociodemographic characteristics [17]. We used census block groups to operationalize neighborhoods. The census block group (approximate population of 1,500) is the smallest unit for which data are available on built environment and sociodemographic measures. We excluded only seven census block groups because of missing data.
2.2 Relative availability of sit-down restaurants and supermarkets
We obtained food resource data from the D&B Duns Market Identifiers File (restaurant and food store Standard Industrial Classification categories; Dun & Bradstreet, Inc., Short Hills, NJ), a secondary commercial data source widely available in the U.S. We then classified the food resources according to primary eight-digit Standard Industrial Classification codes for data in years 1993, 2001, and 2011 (See Table S1 in the Additional File 1). We had expected to compare the business types from years 1990, 2000, and 2010; however, data for 1993, 2001 and 2011 were the only available Dun & Bradstreet business data.
Recent reports suggest that relative availability, i.e., particular proportions of various types of retail food outlets, may be more important to diet-related behaviors than the total number of outlets because relative availability offers residents competing options [18–20]. We chose to study the relative availability of sit-down restaurants and supermarkets. Sit-down restaurants such as ethnic food restaurants and seafood restaurants provide seating to eat instead of only food-to-go (either inside or drive-through). See Table S1 for the SIC codes we used to identify restaurants and food stores. Although fast food restaurants have been blamed for poor U.S. diet quality, evidence indicates that neither fast food nor sit-down restaurant were consistently more healthful [21–23]. Supermarkets are large food stores that included chained or independent hypermarkets (greater than 100,000 square feet), supermarkets (66,000-99,000 square feet), and superstores (55,000-65,000 square feet) in the current study. In the U.S. context, evidence shows that supermarkets have more or cheaper healthy food options compared with grocery stores and convenience stores, which are ubiquitous, smaller in size, and stocked with fewer or more expensive fresh and healthier food items compared with supermarkets [21–23]. We defined the relative availability of sit-down restaurants as the percent relative to total sit-down and fast food restaurants in a neighborhood (abbreviated below as percent of sit-down restaurants). We defined the relative availability of supermarkets as the percent relative to total supermarkets, grocery stores, and convenience stores in a neighborhood (abbreviated as percent of supermarkets below). We used a container-based approach to measure the relative availability of sit-down restaurants and supermarkets and defined the Census Block Group as neighborhood. Therefore, our measure of the relative availability was based on the evidence [24] that the types and distribution of food outlets in the neighborhood are associated with diet-related behavior. We used ArcGIS 10.3 to calculate the count of each type of food resource within each neighborhood in each observational year, and then we used the counts to calculate the percent of sit-down restaurants and supermarkets in STATA 14.0. When there was no sit-down restaurant and fast food restaurant, a constant of one was added to that case so that it remained in the analysis [13]. A previous study validated the D&B food resource data and showed that the matched rate of fast food restaurants may differ by various neighborhood characteristics such as income, race, and location (urbanized area, urban cluster and non-urban area as defined by the US Census Bureau) [25]. For example, if sit-down restaurants had a higher matched rate compared with fast food restaurants in low-income neighborhoods versus high-income neighborhoods in the D&B data, we risked exaggerating the gap in the numbers of sit-down restaurants relative to total sit-down restaurants and fast food restaurants between low-income and high-income neighborhoods. By using multiple dimensions to characterize neighborhood, we may partly address the varied matched rate issue because the lower matching rate raised by, for example, income is partly compensated by introducing mix use or population density factors to characterize neighborhoods jointly.
2.3 Neighborhood type
To classify neighborhood type, we used a cluster analysis that included 13 built environment and sociodemographic characteristics in 1990. Because we did not have data for the same factors in 1993, we assumed that the 1990 built environment and sociodemographic data were a valid substitute for the 1993 data. In the following sections 2.3.1 and 2.3.2, we elaborated on the built environment and sociodemographic characteristics that we chose to generate the six types of neighborhoods. In section 2.3.3 we elaborated on the type of cluster analysis we employed to generate neighborhood type and techniques to examine the robustness of type classification. We did not generate the neighborhood type in 2001 and 2011 because our focus was to examine the change in neighborhood food availability over time based on the neighborhood type identified in the baseline year (1990).
2.3.1 Neighborhood built environment characteristics
Neighborhood built environment characteristics included residential population density, employment population density, mix of land use, and percent of single-family housing in the neighborhood. These characteristics were used widely in the characterization of Western built environment [26–29]. We obtained the census population and land area size data in 1990, 2000, and 2006-2009 from the Census 1990, Census 2000, and the 2006−2009 American Community Survey. We used such data from the US Census Longitudinal Tract Database, which normalized the 1990, 2000, and 2006-2009 census data to the boundaries of census tracts in 2010. We interpolated the normalized census population density data for years from the census tract level to the census block-group level for years 1990, 2000, and 2010. We then measured residential population density as the total residential population divided by the total land area of the block group [30,31], and we measured employment population density as the total employed civilian labor force aged 16 years and above divided by the total land area of the block group. These measures of total land area excluded large bodies of water and parks but included other land uses such as commercial lands and roadways. We obtained data on categories and areas of different types of land uses for the creation of land use mix and percent of single-family housing from the GIS-based current land-use map in 1990, 2000, and 2010 from the Minneapolis Metropolitan Council. We measured the mix of land use by using the 3-tier land use entropy equation (with the denominator set to the static 3 land use types in the block group), which used three land use categories (residential, employment and retail) to calculate mix of land use in the block group [32]. Land use entropy ranges from zero (total homogeneity, with all land use in one category) to 1 (maximum heterogeneity, with an even mixture of land use). We defined the percent of single-family housing as the number of single-family housing units divided by the total number of single-family and multi-family housing units.
2.3.2 Neighborhood sociodemographic characteristics
Neighborhood sociodemographic characteristics included percent of population aged under 14, aged 15-29, 30-44, 45-64, and aged 65 or above according to working age, percent of education of college or above, percent of white race, percent of black race, and median household income. We retrieved all the census sociodemographic characteristics in 1990, 2000, and the 2006−2009 American Community Survey of the U.S. Census Bureau from the US Census Longitudinal Tract Database. We then interpolated the normalized census sociodemographic characteristics data from the census tract level to the census block-group level.
2.3.3 Cluster analyses
Previous work used data reduction techniques such as Principal Component Analysis and factor analysis [10,33] to group variables and generate a composite index; then, previous work used quantile values to classify neighborhoods into different types. Conversely, instead of variables, we used K-means cluster analysis to group observations (i.e., neighborhoods) by data-mining techniques that measured the intrinsic relationship between neighborhood characteristics based on Euclidean k-median clustering algorithm. We first transformed each 1990 built environment and sociodemographic variable into a z-score to achieve more comparable scales and ranges; otherwise, variables with large ranges might have weighed heavier in the analysis than variables with small ranges [34]. We then used the transformed data to perform partition cluster analyses within the 13 built environment and sociodemographic characteristics, using K-means in Stata 14.0. Because a wrong assessment of the number of clusters can lead to sub-optimum allocation of precious resources, we used three statistical approaches, Gap Statistic Method, Average Silhouette Method and Elbow Method [34], to identify the goodness of the number of clusters we chose. These three methods recommended six, seven, and six or seven clusters, respectively (Figures S1-S3 in the Additional File 1). We finally chose a six-cluster solution based on the associated cluster statistics and the interpretability of the results.
2.4 Covariates
Previous studies suggested that sit-down restaurants tend to be located in high density neighborhoods because of walkability and the cozy atmosphere offered by urban environments [35,36]. Owners of sit-down restaurants and supermarkets as basic amenities may be disinclined to locate in Black or poor neighborhoods [36–40] because of uncertainty in investing in such neighborhoods. In addition, highly restrictive land use such as single-family housing may limit the introduction of sit-down restaurants locating near restrictive land use because sit-down restaurants may attract traffic, generates noise, and promote unlawful behavior [41,42]. On the basis of such reports, we incorporated the four variables, residential/employment population density, median household income, percent of white race, and percent of single-family housing as covariates into the models. For those four covariates, to represent the changes in neighborhood characteristics during that period, we added to our models four time-varying variables, which were the changes in residential/employment population density, median household income, percent of white race, and percent of single-family housing from 1990. For example, we calculated the change in employment population density in 1990 as zero. We then calculated the change in employment population density in 2000 as the employment population density in 2000 minus the employment population density in 1990. We then calculated the change in employment population density in 2006-2009 as the employment population in 2006-2009 minus the employment population density in 1990. We calculated changes in residential population density, median household income, percent of white race, and percent of single-family housing by the same method that we used to calculate changes in employment population density. We used the changes in residential population density and employment population density in the sit-down restaurant and supermarket models, respectively. Adding such change variables was necessary because we measured neighborhood type only for 1993, which could not be used to explain the change in percent of sit-down restaurants and supermarkets between 1993 and 2011.
Sit-down restaurant purveyors may prefer to locate their restaurants in neighborhoods that already have a large number of restaurants to draw customers who may seek variety [43,44]. Therefore, we added the total number of sit-down restaurants and fast food restaurants as one of covariates in the sit-down restaurant model. However, supermarket purveyors may not prefer to locate in neighborhoods that already have a large number of different types of food stores because competition may reduce the likelihood of customers who tend to prefer to shop at a specific outlet [45]. Therefore, we added the total number of supermarkets, grocery stores and convenience stores as one of the covariates in the supermarket model.
2.5 Statistical analyses
All descriptive analyses and multivariable models were performed using Stata 14.0 (StataCorp, College Station, TX).
2.5.1 Descriptive statistics
We calculated means and standard deviations (for continuous variables) of neighborhood built environment characteristics, neighborhood sociodemographic characteristics, and the relative availability of sit-down restaurants and supermarkets in the neighborhood in 1990/1993, 2001 and 2011. We used one-tailed Student’s t-test and Kruskal-Wallis H test to test for statistically significant differences in means and medians for continuous variables.
2.5.2 Relationship between neighborhood type and relative availability of sit-down restaurants and supermarkets
We used multivariable linear mixed effects regression models to estimate the associations between neighborhood type in 1993 and the percent of sit-down restaurants and percent of supermarkets in 1993, 2001, and 2011 (n=2,083). These models appropriately accounted for the clustered data structure of repeated measurements over time within each neighborhood. Specifically, one neighborhood in 1993 had many similarities compared to the same neighborhood in 2001 and 2011, which may have violated the principal of independently and identically distributed observations. To address the “repeated-measurement” feature of the data, we implemented mixed effects regression models for the percent of sit-down restaurants and percent of supermarkets. We modeled the percent of sit-down restaurants/supermarkets in each neighborhood as a function of neighborhood type in 1993, the time elapsed in years from 1993, the term for the interaction of neighborhood type in 1993 with elapsed time, and the time-varying covariates, which we denoted as baseline-change models [46]. We performed baseline-change analysis to assess how neighborhood characteristics (as measured by neighborhood type) at the baseline year modified the effect of time on the relative availability of sit-down restaurants and supermarket. If “neighborhood type at the baseline year” failed to modify the effect of time on the relative availability of sit-down restaurants and supermarkets, then the increase rates in the relative availability of sit-down restaurants and supermarkets should be the same across the baseline-year neighborhood type. Although the results of baseline-change models explicitly disclosed which types of neighborhoods experienced greater increases in the relative availability of sit-down restaurants and supermarkets, we did not stop at that point. Instead, we next employed post-estimated linear contrasts based on the results of same models, which enabled us to compare the relative availability of sit-down restaurants and supermarkets across neighborhood type in each observational year. Approximately 68% of neighborhoods did not change neighborhood type between 1993 and 2011, and the models failed to converge after we incorporated the variable of change in neighborhood type over time. Thus, we added neighborhood-level time-varying variables into the model to address the issue that the neighborhood type at the baseline year failed to account for the change in the relative availability of sit-down restaurants and supermarkets over time. We included random intercepts for each neighborhood in the sit-down restaurant and supermarket models to enable responses to vary within neighborhoods. Because a census block group is a small area in dense areas, we tested whether our results were sensitive with respect to different measures of relative availability of sit-down restaurants and supermarkets based on Census Tract as well as Census Place (i.e., city or town).