Hedonic pricing model
The hedonic pricing method provides a basis for explaining housing prices as a function of the levels of characteristics embedded in each house, including the environmental quality associated with the housing unit’s location. The HPM is often used to estimate the value of non-market goods, especially environmental amenities/disamenities (such as open space and air quality) that are not directly traded in the market (Atreya et al., 2016; Chen and Jin, 2019; Kim et al., 2003; Li et al., 2016). Like the environmental amenities, the food environment also represents a built environment, influencing households’ bid rents and further affecting housing prices. Hence, adopting the HPM to estimate the value of unhealthy food environments is a suitable choice in this study. Although a few previous studies have added proximity to superstores as control variables to explore the determinants of property values in their HPM specifications. For instance, Tyvimaa et al. (2015) and Heyman and Sommervoll (2019) find housing prices increase when the distances to supermarkets rise in Helsinki and Oslo, Norway, respectively. However, to our knowledge, no research has used the HPM to estimate the value of unhealthy food environments.
An HPM is mainly composed of three types of attributes, including structural variables, locational attributes, and neighborhood characteristics (Anselin and Gallo, 2006; Kim et al., 2003; Li et al., 2019; Schläpfer et al., 2015). Locational attributes (for example, distances to employment centers and parks) have been commonly added to HPMs because these sites bring value to people living near these locations (Cao et al., 2021; Kim et al., 2003; Li et al.,2019; Schläpfer et al., 2015). Neighborhood socioeconomic characteristics are also closely associated with housing prices since these characteristics often represent a bundle of local public services and amenities (Anselin and Gallo, 2006; Bark et al., 2011; Cao et al., 2021; Lin et al., 2014). Given that our main objective is to examine the impacts of food swamps on housing prices, we include the food swamp variable into the general HPM. In sum, our HPM could be expressed in the following matrix form:
\(P=\alpha {\iota _n}+X\beta +\varepsilon ,{\text{ }}\varepsilon \sim N(0,{\sigma ^2}{I_n})\)
|
(1)
|
where P represents an \(n\times 1\) vector of the housing prices, \({\iota }_{n}\) is an \(n\times 1\) vector of ones associated with the constant term parameter \(\alpha\). \(X\) denotes an \(n\times k\) matrix representing all explanatory variables, including houses’ unhealthy food environment, structural variables, locational attributes, neighborhood socioeconomic characteristics, and control variables. Specifically, structural variables contain information such as living area, lot size, house age, number of bedrooms and bathrooms, and house conditions. Locational attributes measure accessibility to Downtown, University of Alberta, rivers, hospitals, and parks. Neighborhood socioeconomic characteristics mainly include neighborhood-level census data. \(\beta\) is a \(k\times 1\) vector that represents the parameters of explanatory variables. \(\epsilon\) is an \(n\times 1\) vector of independent and identically distributed error terms.
Generally, to decide the suitable functional form of the HPM, researchers choose the functional form according to certain goodness of fit criteria (Kim et al., 2003; Saphores and Li, 2012). This study estimates four functional forms of the HPM, including linear, log-log, log-linear, and semi-log forms. The log-log form generates the lowest AIC and BIC values and, therefore, is selected for further empirical analysis.
Spatial hedonic pricing model
Since the attributes of properties are inherently spatially dependent (e.g., high and low property values tend to cluster together in certain neighborhoods), estimation of the HPM in Eq. (1) is likely to be biased if we ignore the spatial autocorrelation. To deal with the spatial dependence issue, we employ three spatial regression models following prior studies (D’Elia et al., 2020; Muller and Loomis, 2008; Osseni et al., 2021). Final model selection will depend on specific tests and model selection criteria. First, we consider the spatial lag (SAR) model, which allows for direct spatial interactions in the dependent variable:
\(P=\alpha {\iota _n}+\rho WP+X\beta +\varepsilon ,{\text{ }}\varepsilon \sim N(0,{\sigma ^2}{I_n})\)
|
(2)
|
where W is an n × n spatial weights matrix, the term WP represents the spatially weighted neighborhood housing prices, and ρ is a spatial autoregressive parameter for the term WP. Then, we consider the spatial error model (SEM), which can be expressed in matrix form as:
\(P=\alpha {\iota _n}+X\beta +u,{\text{ }}u=\lambda Wu+\varepsilon ,{\text{ }}\varepsilon \sim N(0,{\sigma ^2}{I_n})\)
|
(3)
|
where the term
Wu represents the weighted average of the disturbances, and λ is the spatial autocorrelation coefficient for the endogenous variable
Wu. Finally, we consider the spatial autoregressive confused (SAC) model
, which combines the SAR and SEM models. The SAC model can be expressed as follows:
\(P=\alpha {\iota _n}+\rho WP+X\beta +u,{\text{ }}u=\lambda Wu+\varepsilon ,{\text{ }}\varepsilon \sim N(0,{\sigma ^2}{I_n})\)
|
(4)
|
In the SAC model, when \(\rho =0\), the model becomes SEM, and when \(\lambda =0\), it becomes SAR. If both parameters (\(\rho ,\lambda\)) are zero, then the model becomes the non-spatial standard linear regression model. We conduct the following tests to find the most suitable model to describe our data. First, we conduct a series of Moran’s I tests, Lagrange multiplier (LM) tests, and robust LM tests to check the existence of spatial effects. Then, a likelihood ratio (LR) test is used to test whether the SAC model can be simplified to a SAR or an SEM.
Regarding the weights matrix, we consider the k-nearest neighbor criterion and the contiguity-based queen criterion. For the former, we try k = 5, 10, and 20 to check the sensitivity of the results to neighbor specifications. For the latter, we first create Thiessen polygons for each house location and then choose the queen criterion to define neighbors.
Estimation of marginal effects
The coefficients for the non-spatial linear model and SEM can be interpreted directly as the marginal effects, which is the partial derivative of \({P}_{i}\) with respect to \({x}_{ir}\) for any explanatory variable r. However, for models that contain spatial lagged dependent variable \(WY\) (e.g., SAR and SAC models), the estimated marginal effects for the explanatory variables are more complicated. Because the change of an explanatory variable for a given observation will affect the dependent variable in the same location directly and affect the dependent variable in all other locations indirectly. To illustrate the direct and indirect impacts in SAR and SAC models, we take a look at the marginal effects matrix \({M}_{r}\left(W\right)\) for a specific exogenous variable xr:
In Eq. (5), element \(\partial {P}_{i}/\partial {x}_{ir}\) on the diagonal of \({M}_{r}\left(\text{W}\right)\) measures the direct effect on the dependent variable Pi from a change in xir, and the off-diagonal element \(\partial {P}_{j}/\partial {x}_{ir}\) of \({M}_{r}\left(\text{W}\right)\) measures the indirect effect on the dependent variable Pj from a change in xir. LeSage and Pace (2009) suggest using the average direct, the average indirect, and the average total effects to summarize the marginal effects. Specifically, the average direct effect (ADE) is the average of the diagonal terms in \({M}_{r}\left(W\right)\). The average indirect effect (AIE) is the average of the column sums of the off-diagonal elements in \({M}_{r}\left(W\right)\). The average total effect (ATE) is the summation of ADE and AIE, which is obtained by averaging all the column sums of \({M}_{r}\left(W\right)\). Furthermore, we estimate households’ marginal WTP for residing in food swamps based on the estimated marginal effects from the spatial HPMs (Bockstael and McConnell, 2007). Given the log-log form of a SAR/SAC model, the total, direct, and indirect marginal WTP for dummy variables (e.g., whether living in food swamp neighborhoods) can be expressed as:
where \(\bar {P}\) represents the average value of properties in Edmonton.
Study Area And Data
Study Area
This study is conducted in Edmonton, Alberta, Canada (see Fig. 2). Edmonton is Alberta's capital city and major economic center, with a population of 972,223 in 2019 (City of Edmonton, 2019). According to a report investigated in Alberta (Health Quality Council of Alberta, 2015), 24.1% of adults aged over 18 in Edmonton were classified as obese in 2014, higher than the national average of 20.2%. The municipal government has been making great efforts to improve residents’ eating behavior by various strategies, including creating a healthier food environment. The city’s Food and Urban Agriculture Strategy, Fresh, was launched in 2012 to make Edmonton a better place to live and work (City of Edmonton, 2012). One of Fresh’s main goals is to construct a healthier and more food secure community by increasing accessibility to enough nutritious food and encouraging families and communities to grow, preserve and purchase local food (City of Edmonton, 2012).
[Figure 2 is about here]
Undoubtedly, Fresh contributes to the development of a healthier and more nutritious food environment in Edmonton. Under this initiative, many strategies and programs promote healthy eating (for example, increasing the intake of healthy foods and reducing the consumption of unhealthy foods) and emphasize changing perspectives to have better lifelong eating habits. Our research on people’s preferences for unhealthy foods should provide helpful information to help Fresh develop tailor-made strategies to construct a healthier food environment and promote healthy eating.
Housing price data
Mainly two types of housing price data have been used in the HPM literature. One is the assessment data usually provided by the local government, and the other is the arm’s-length transaction data provided by private companies (Li et al., 2019). Compared to the transaction data, the assessment data could provide more complete housing price data. However, the assessment data may lack essential information on the structural characteristics. Furthermore, the values may not sufficiently represent the market values due to inappropriate assessment methods (Li et al., 2019). Transaction data are usually the recommended ones for the HPM analyses (Freeman et al., 2014, pp 317).
This study collects transaction data on single-family residential properties throughout 2015–2017 from the Real Property Solutions. A total of 8,241 sales transaction records are collected after excluding missing or mistyped values in the structural variables. Using Alberta Consumer Price Index (CPI) provided by Statistics Canada (2017), sales transaction prices are adjusted to 2016 Edmonton housing market values. The average sale price for the properties in our sample is C$460,794.40 in 2016 dollars. These transaction data also comprise detailed information about house structure characteristics and house locations. The distribution of property values in Edmonton is presented in Fig. 3. It can be seen that the property values exhibit obvious spatial autocorrelation. Relatively high-priced houses are located next to each other, mainly in the southwest of the city, while relatively low-priced houses are clustered in the north and southeast of the city.
[Figure 3 is around here]
Food outlets data and identifying food swamps
The concept of food swamps was first introduced by Rose et al. (2009); however, there is no uniform definition of a food swamp. Some researchers define food swamps as communities where unhealthy food choices inundate healthy food options (Cooksey-Stowers et al., 2017; Luan et al., 2015). Others also include the income parameter as a criterion when defining food swamps since unhealthy food stores are found located disproportionately in low-income neighborhoods (Rose et al., 2009). Overall, there are three measures adopted to define food swamps in the existing literature: (1) neighborhoods with high availability of unhealthy food outlets (Cooksey-Stowers et al., 2017; Hager et al., 2016; Luan et al., 2015), (2) neighborhoods with low availability of healthy food and high coverage of unhealthy food (often measured as the healthy food ratio, i.e., healthy food outlets divided by total food outlets) (Cooksey-Stowers et al., 2017; Luan et al., 2015), and (3) low-income neighborhoods with low availability of healthy food and high coverage of unhealthy food (Rose et al., 2009).
Following these practices, three different definitions are used to identify food swamps in this study. Definition 1 considers only the high availability of unhealthy food outlets. We set the top quantile of the number of service areas as an indicator of the high availability of unhealthy food outlets. Definition 2 adds the condition of a low level of healthy food ratio (by selecting below the median level of healthy-to-unhealthy food ratio) to define food swamps. Based on Definition 2, Definition 3 further incorporates the criterion of low-income (choosing above the city median level of low-income rate) to define food swamps.
To identify food swamps, we first collect the locations of unhealthy food stores (fast food restaurants and convenience stores) and healthy food stores (supermarkets and grocery stores) from the City of Edmonton business licenses database (2018). Fast food restaurants are defined as quick service food outlets that offer consistent, popular, high-calorie, and expedited food such as sandwiches, hamburgers, fried chicken, and pizza (Block et al., 2004; Jekanowski et al., 2001). Because of the standardized menu and pre-cooked foods, customers only need to spend minimal time obtaining product information and receiving their meals (Jekanowski et al., 2001). Convenience stores are also considered unhealthy food outlets because they predominantly non-perishable stock items, including snacks, sweets, and junk foods (Lee, 2012; Li and Ashuri, 2018). Supermarkets and grocery stores are healthy stores that sell and consistently stock a wide range of products, including fresh produce, dairy items, and meat products (Li and Ashuri, 2018).
Labeling franchised fast-food restaurants and convenience stores as unhealthy food stores, and supermarket chains and local grocery stores as healthy food stores is a bit strong. However, in practice, it is difficult to completely distinguish between healthy and unhealthy food stores. In addition, relevant data is usually not available. Therefore, in the food environment literature (see, for example, Cooksey-Stowers et al., 2017; Kolak et al., 2018; Wang et al., 2014), it is a common practice to label fast-food restaurants and convenience stores as unhealthy food stores, and treat supermarket chains and local grocery stores as healthy food retailers. The main criterion adopted by the literature to distinguish healthy/unhealthy food stores is whether the store has the potential to provide a wide range of healthy and fresh foods.
After we cross-validate all the food stores’ information by checking their official website and google map locations, a total of 822 fast food restaurants, 232 convenience stores, 91 supermarkets, and 87 grocery stores are identified in Edmonton. Then, we follow recent studies in Edmonton (Wang and Qiu, 2016; Yang et al., 2020) and choose the 1000-meter as the threshold to create a service area around each unhealthy food outlet. At last, we count the total number of service areas within each neighborhood and use the number of service areas as the baseline criterion to identify food swamps.
Locational attributes data
We collect the locations of River, Downtown, University, hospitals, and parks from various sources. The North Saskatchewan River is a majestic river that flows through Edmonton (See Fig. 2). The river provides Edmontonians with various recreation activities, including canoeing, kayaking, jet-skiing, and fishing (City of Edmonton, 2020). We obtain the North Saskatchewan River shapefile from the Alberta Government (2018). Downtown Edmonton is the central business district of Edmonton and is home to more than 200 eateries and hundreds of shops (Downtown business association, 2020). The University of Alberta is one of Canada’s top universities and is Alberta’s 4th largest employer, hiring almost 15,000 employees (University of Alberta, 2020). We extract the locations of Downtown, University of Alberta, and hospitals from the City of Edmonton Open Data Catalogue (2016). To generate locational attributes for each property, we calculate road network distances to the North Saskatchewan River, the centroid of Downtown, the University of Alberta, and the nearest hospital.
Along the North Saskatchewan Riverbank is a chain of city parks collectively known as the North Saskatchewan River Valley Parks System. This Parks System is Canada’s largest stretch of urban parks and comprises over 20 major parks. In addition to this Parks System, there are over 500 neighborhood and city parks across the city. We obtain all the park location information from the City of Edmonton Open Data Catalogue (2016). To measure access to parks, we first create a 200-meter buffer area around each property and then calculate the square meters of parks within each buffer.
Neighborhood socioeconomic data
Neighborhood socioeconomic data for 2016 are extracted from the City of Edmonton Open Data Catalogue (2018). After excluding industrial neighborhoods, we recognize 247 residential neighborhoods and focus on them in empirical investigation. Following previous studies (Anselin and Gallo, 2006; Lin et al., 2014; Tian et al., 2017), we include neighborhood-level population density (Population density), the ratio of the children aged under 14 (Children), the ratio of the senior population aged over 60 (Senior), the ratio of residents who have a postsecondary certificate (High Education), and the ratio of unemployed residents (Unemployment). Except for the above explanatory variables, we also include the seasonal dummy and year dummies that may influence the housing prices (Cao et al., 2021). Table 1 provides descriptive statistics for all the dependent and independent variables.
Table 1
Variables
|
Definition
|
Mean
|
Std. Dev.
|
Dependent Variable
|
|
|
|
Pricea
|
Sale price of the property (2016$)
|
460,794.40
|
203,808.10
|
Food Environment Types
|
|
|
|
Food swamp definition 1
|
1 if house is located in food swamp neighborhood (here a food swamp is defined as an area with access to large amounts of energy dense foods); 0 otherwise
|
0.21
|
0.41
|
Food swamp definition 2
|
1 if house is located in food swamp neighborhood (here a food swamp is defined as an area with access to large amounts of energy dense foods and limited access to healthy food options); 0 otherwise
|
0.10
|
0.29
|
Food swamp definition 3
|
1 if house is located in food swamp neighborhood (here a food swamp is defined as an area with access to large amounts of energy dense foods, limited access to healthy food options, and such an area composed of low-income neighborhood); 0 otherwise
|
0.07
|
0.26
|
Structural Variables
|
|
|
Living areaa
|
Square feet of living space
|
1,559.64
|
620.8
|
Lot sizea
|
Square feet of lands owned by a household
|
5,873.90
|
4,338.71
|
Bedroom
|
Number of bedrooms
|
2.92
|
0.65
|
Bathroom
|
Number of bathrooms
|
1.64
|
0.66
|
House condition d1
|
1 if the house condition is average; 0 otherwise
|
0.34
|
0.48
|
House condition d2
|
1 if the house condition is good, 0 otherwise
|
0.31
|
0.46
|
House condition d3
|
1 if the house condition is excellent, 0 otherwise
|
0.34
|
0.47
|
Basement condition d1
|
1 if the basement is partial finished, 0 otherwise
|
0.11
|
0.32
|
Basement condition d2
|
1 if the basement is finished, 0 otherwise
|
0.67
|
0.47
|
Garage
|
Capacity of garages (double or single)
|
1.83
|
0.47
|
House age
|
Age of the house
|
29.2
|
23.21
|
Locational Variables
|
|
|
Rivera
|
Distance to the North Saskatchewan River
|
4,385.17
|
3,284.15
|
Downtowna
|
Distance to Downtown
|
10,566.17
|
4,305.53
|
Universitya
|
Distance to University of Alberta
|
11,443.18
|
3,959.27
|
Hospitala
|
Distance to the nearest hospital
|
5,050.04
|
2,352.48
|
Parka
|
100 m2 of park within a 200-meter buffer
|
40.61
|
93.5
|
Neighborhood Socioeconomic Status
|
|
|
Population densitya
|
Neighborhood level population density (Per capita/Km2)
|
3,071.33
|
1,036.51
|
Children
|
The ratio of the children aged under 14
|
0.18
|
0.05
|
Senior
|
The ratio of the senior population aged over 65
|
0.14
|
0.08
|
High Education
|
The ratio of residents who have a postsecondary degree/certificate
|
0.63
|
0.12
|
Unemployment
|
The ratio of residents who are unemployed
|
0.09
|
0.04
|
Control Variables
|
|
|
|
Season
|
1 if house is sold between April and September, 0 otherwise
|
0.58
|
0.49
|
Year 2016
|
1 if house is sold in year 2016, 0 otherwise
|
0.40
|
0.49
|
Year 2017
|
1 if house is sold in year 2017, 0 otherwise
|
0.14
|
0.34
|
Note: a In the method and result sections, these variables are transformed to log forms. |
[Table 1 is around here]
[1] The SAC model is also referred as SARAR (spatial autoregressive model with autoregressive residuals), or Cliff-Ord by Kelejian and Prucha (1998) and the Kelejian-Prucha model by Elhorst (2010).