Revealing U.S. Retail Industries’ Functional Hierarchy Through Demand Thresholds

doi:10.21203/rs.3.rs-2094198/v1

Download PDF

Research Article

Revealing U.S. Retail Industries’ Functional Hierarchy Through Demand Thresholds

https://doi.org/10.21203/rs.3.rs-2094198/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

We explore the structure of the U.S. retail sector by estimating county-level demand thresholds for 11 retail industries using establishment-level data from the U.S. Census Bureau’s Longitudinal Business Database and Integrated Longitudinal Business Database. In addition to providing accurate and precise demand threshold estimates at a highly disaggregated industry level, we also explore how location outcomes differ across employers and non-employers using Poisson and negative binomial estimations as well as their zero inflated counterparts. Findings provide insight into the industrial organization of retail industries and suggest that important retail location determinants include retail leakages, sector interdependencies, social capital, natural assets, and other place-based factors.

JEL Codes: D0, R1, L1, L8

Retail

Demand Threshold

Regional Economics

Economic Development

Zero Inflation

Microdata

Recent reports indicate that 37 million working-age people reside in rural America (Hammock, 2019). Research shows that over the past thirty years rural residents are consistently more likely to be non-farm entrepreneurs compared to urban people (Thiede et al., 2017). Further, these rural businesses report slightly higher revenue and profitability than their urban and suburban counterparts (Small Business Credit Survey, 2017). Indeed, small businesses play a key role in the economic vitality of rural communities by providing employment, generating income, and improving the quality of life for residents (Memili et al., 2015). However, among small rural businesses, retailers face many unique challenges that can impact their survival and the ability of communities to attract and retain local businesses. These include proximity to consumers and suppliers, small market size, lack of skilled labor and capital, technology integration, escalating competition from discount retailers, franchises, and regional shopping centers (He et al., 2017; Ring et al., 2010). Together, these findings imply that a multitude of factors may converge to create a best location for a given type of rural retailer and thereby suggest a need to understand retail location criteria and methods. Guided by central place theory, the present paper uses unsuppressed administrative data to advance the demand threshold approach and model county locations for rural retail establishments.

Demand threshold models are frequently used within this context to estimate the minimum population required to sustain a particular type of establishment. However, one practical challenge that limited the demand threshold literature is the nondisclosure of establishment data for rural areas. This is a particularly vexing issue since many of these rural communities struggle with surplus leakages and population retention (ERS, 2017). Data limitations made estimating demand thresholds for more finely disaggregated industries in the North American Industry Classification System (NAICS) difficult, the inclusion of other explanatory variables limited, and the estimation of an industry’s population threshold across all counties nearly impossible.

We seek to address gaps in the demand threshold literature by aggregating establishment-level data to estimate demand thresholds for retail industries across the contiguous United States in 2014. Specifically, we address the question: how do counties’ relative locations, economic conditions, and place-based factors influence the minimum population necessary to support specific retail industries? In addition to this, we complete multiple robustness checks across employers and non-employers using the Poisson and negative binomial distributions as well as their zero inflated counterparts to accurately model the business location decision within multiple industries of the retail sector. Specific variables of interest that are particularly germane to rural economic development in the United States include broadband internet access, social capital, and labor leakages.

Previous studies primarily explored the population thresholds necessary to support a particular business, the interdependencies between central places and neighbors (e.g. Chakraborty, 2012; Mushinski & Weiler, 2002; Wensley & Stabler, 1998a), as well as interdependencies between other related industries (Shonkwiler & Harris, 1996). However, the nondisclosure of establishment counts, let alone the availability of employment data as an alternative mentioned by Shonkwiler and Harris (1996), limited the geographical and temporal scope of the literature. Furthermore, this data nondisclosure, which disproportionately affects rural areas, limits researchers’ ability to test a larger set of economic and community variables that may explain differences between predicted and actual establishment counts. While previous literature characterized these differences as a local community’s strengths and weaknesses (e.g. Chakraborty, 2012), larger datasets may allow for a more nuanced investigation into potential opportunities or barriers for retail development.

This research builds on previous studies by using county aggregated establishment-level data to explore the influence of multiple community capital factors and industry interdependencies on retail demand thresholds for 11 retail industries across the contiguous United States for 2014.[2] In addition to providing detailed demand threshold estimates for the number of retail establishments, we also recognize the possibility that different establishment types (employers and non-employers) likely serve different sized places. In other words, the population threshold necessary to support a non-employer establishment in a specific retail industry may be much smaller than the population threshold for an employer establishment in the same industry.[3] While the literature finds that industries in the hinterlands have lower population thresholds than larger central places, we hypothesize that the more austere market conditions of the hinterlands may also produce more non-employer establishments that may have lower population thresholds. To explore the potential differentiation of non-employers in Christaller’s (1966) functional hierarchy, we estimate demand threshold models for employers and non-employer establishments separately for each retail industry, and regress a third set of models on employment as a robustness check.

The paper proceeds by first reviewing the literature on central place theory, demand thresholds, and community capitals. Following a description of the data, we consider the factors influencing each retail industry’s business location decisions to develop realistic functional forms, and elaborate on relevant community capitals that may influence retail population thresholds. Given the scope of this article, we only present and discuss a subset of results in the results section.[4] Findings provide insight into the industrial organization of various retail industries and suggest that important retail location determinants include retail leakages, sector interdependencies, social capital, natural assets, and other place-based factors. The conclusion highlights key findings while also detailing where there is a need for future research.

[2] This article follows the NAICS naming conventions in our description of industries and sectors: the first two NAICS digits designate the economic sector, the third digit designates the subsector, the fourth digit designates the industry group, the fifth digit designates the NAICS industry, and the sixth digit designates the national industry.

[3] The Census defines non-employer businesses as businesses earning at least $1,000 in annual revenue while maintaining no formal employees. An example of non-employer businesses with informal employees is a family owned and operated supermarket/grocery.

[4] The additional and complete results are available from the authors upon request.

Central place theory (Christaller, 1933; Lösch, 1940) builds off the interaction between consumer choice and firm agglomeration to develop a functional hierarchy that describes the emergence of central places (Mulligan et al., 2012). In essence, consumers seek to minimize time and travel costs spent on searching for and acquiring goods and services, and this, along with exogenously determined industry specific cost information, determines the spatial radius, or range, of a particular industry’s good (Chakraborty, 2012; Mulligan et al., 2012; Pennerstorfer & Pennerstorfer, 2019). In addition to the external ranges of goods and services, each place also has an internal range defined by the population size of the place. An aspect of CPT is a hypothesis that each industry has a different population threshold necessary for a place to achieve to sustain an establishment in that industry.

Demand threshold analyses seek to identify the minimum population required for an establishment within a particular industry to earn a sufficient rate of return to stay in business (Berry & Garrison, 1958a, 1958b; Harris & Shonkwiler, 1997; Parr & Denike, 2016; Shaffer et al., 2004). Derivates of demand threshold analysis, such as Bresnahan and Reiss's (1991) “entry thresholds” for retail firms or Cleary et al.'s (2019) “breakeven market sizes” for food hubs use ordered probits to identify local conditions that influence the number of firms in a market. While many sectors rely on the local market size to support their businesses, demand threshold analyses are most commonly conducted on retail and service industries due to their relatively greater dependence on demand-side determinants. Unfortunately, due to data nondisclosure, particularly in rural areas where establishment counts are censored for privacy reasons, the accuracy of retail demand threshold estimates is greatly reduced. As a result there is little reliable empirical evidence testing CPT in sparsely settled places. Obtaining these unsupressed data is critical to not only investigating the factors that determine location decisions in rural areas, but also in fleshing out Christaller’s CPT. For example, while several studies have noted the lower population thresholds for some retail industries in remote rural areas (e.g. Wensley & Stabler, 1998), it may be that while employer establishments in these hinterlands do have different thresholds, they may also have a different mix of establishment types (i.e. non-employer establishments) that are more adaptable to smaller isolated markets.

[Approximate Position of Figure 1]

Following Bresnahan and Reiss (1991) and Cleary et al. (2019), figure 1 illustrates the simple microeconomic mechanisms through which CPT shapes the distribution of retail industries. As Bresnahan and Reiss (1991) and Cleary et al. (2019) demonstrate, recognizing how these regional concepts operate through the theory of the firm is useful for both conceptual understanding and in interpreting results. More populous places will experience greater demand due to a larger market size, S₂, creating incentives for more establishments to enter the market. A central place’s market size may also increase from agglomeration economies incentivizing consumers from neighboring counties to commute for lower search costs. Similarly, establishments may realize lower average costs, AC, due to pooled marketing and shared resources, or higher AC due to labor shortages, higher square footage, or tax costs. These microeconomic foundations also show how non-employers may successfully provide retail services a small market due to lower AC.

Non-employers are defined by the Census Bureau and Internal Revenue Service as a business with no formally paid employees and at least $1,000 in annual sales. This definition includes many potential business types including single proprietors working out of their home, market booth, or storefront, family businesses where working family members are not formally paid, and businesses that may not be the owner’s primary source of income. According to the Survey of Business Owners (U.S. Census Bureau, 2012), non-employer retail firms earn an average payroll of $30,160 annually and have higher shares of minority (26% vs. 21%) and female (24% vs. 20%) ownership compared to employer businesses. Given non-employers’ economic activity, prevalence, and cost advantages, it is clear non-employers play an important role in the provision of goods and services, particularly in lower tiered places. Non-employers may also represent other aspects of local economic development such as entrepreneurship, economic opportunity, creativity, or future employer businesses.

Demand thresholds vary significantly across industries even within the retail sector (Shonkwiler & Harris, 1996), but due to data limitations, most estimated demand thresholds are only available at a highly aggregated industry level. For example, Chakraborty (2012) seems to present the most comprehensive retail demand thresholds estimates to date by considering twelve retail industries, aggregated at the three-digit NAICS level, across 2,201 rural United States counties. Yet, this aggregation still assumes the demand thresholds are the same for sporting goods stores and book stores or clothing stores and jewlery stores. Furthermore, we are not aware of any demand threshold analyses that include non-employer establishments, which are likely vital in offering basic delivery of higher-ordered goods and services than the place’s hierarchical tier might otherwise allow. At a time when employment in services such as retail are concentrating in areas with high aggregate employment (Desmet & Fafchamps, 2005), better understanding the location factors of disaggregated employer and non-employer retail industries may be critical preventing retail leakage from the hinterlands.

In recent decades, Krugman's (1991, 2010) new economic geography (NEG) received relatively more attention than demand threshold analyses in the academic literature, but the complementarity between the two makes CPT and demand threshold analysis ripe for a reemergence (Mulligan et al., 2012). Extant research on retail demand thresholds primarily focuses on identifying how socio-economic characteristics influence the minimum population thresholds for various retail industries (Chakraborty, 2012; Deller & Harris, 1993; Wensley & Stabler, 1998), spatial interdependencies (Mushinski & Weiler, 2002; Thilmany et al., 2005), and economies of agglomeration (Henderson et al., 2000; Shonkwiler & Harris, 1996). While each of these veins of research are individually presented as integral in estimating accurate and realistic demand thresholds, many studies seem to omit one set of variables in favor of testing another set of determinants. Perhaps this is necessary due to data limitations or is justified given the restricted regional scope of some studies, but differences between fitted and actual values may be a result of omitted variables and not simply instances of local industries under- or over-retailing. However, one should not discount these contributions based on the means available to them, as they each reveal important and innovative insights in the retail demand threshold literature. We review some particularly relevant articles in the remainder of this section.

Mushinski & Weiler (2002) explore the importance of allowing for spatial interdependencies between central places and their surrounding hinterlands when estimating demand thresholds for retail industries in the Intermountain West. While many lower-ordered retailing establishments may exist in both central places as well as their hinterlands, the prevalence of some higher-ordered retailing establishments in the hinterlands may decrease the need for them in the central place, or vice versa. Using a simultaneous Tobit model, Mushinski and Weiler find that retail industries such as merchandise stores, apparel stores, auto dealers, and furniture stores display these supply-side interdependencies. In addition to these supply-side interdependencies, gas stations and eating and drinking establishments also display demand-side interdependencies. Demand-side interdependencies arise from increases in the hinterlands’ population leading to more of these establishments locating in the central place due to their characteristically mobile clientele. Still, other industries, such as building supply stores, food stores, and drug stores displayed no spatial interdependencies. Mushinski and Weiler's article is particularly germane to the current context, because it begins to reveal how and why some establishment counts may face different data-generation processes depending on the nature of the good or service they provide and their sensitivity to similar establishments in adjoining places. While Partridge et al. (2008) find rural areas near central places observe greater population growth from urban agglomeration spillovers, greater proximity to central places may also lead to greater retail leakages and increasing retail demand thresholds.

In addition to interdependencies across space, there also appear to be interdependencies across industries, or evidence of economies of agglomeration. Shonkwiler & Harris (1996) apply a demand threshold analysis to three retail industries in rural markets and find significant evidence of between-industry agglomeration economies in complementary industries. Similarly, Henderson et al. (2000) estimate demand thresholds for hospital services in Texas and find significant evidence of within-industry agglomeration economies. Theoretically, rural areas should have lower demand thresholds within a particular industry due to the significant implied travel costs if a rural resident were to purchase the good in a higher-tiered place. Empirical evidence supports this theory (e.g. Wensley & Stabler, 1998a) but Henderson et al.'s (2000) conclusions show how the benefits from agglomeration economies within higher-ordered industries may offset the demand benefits from locating in a more remote region.

Chakraborty (2012) observes the issue of zero-inflation in retail establishment counts by estimating retail demand thresholds across United States counties with populations less than 50,000 people using Hurdle Poisson (HP) and Zero Inflated Poisson (ZIP) models. Chakraborty finds that in ten out of the twelve retail industries analyzed, Vuong’s test indicated the ZIP model over the HP. However, Wilson (2015) notes the misuse of the Vuong’s test in testing for zero inflation due to an unknown distribution for the test statistic.[5] Regardless, Chakraborty makes important contributions to the literature by acknowledging the importance of zero-inflated data generation processes in the retail sector, and by estimating demand thresholds for twelve retail industries with a relatively rich county-level dataset.

[5] Chakraborty (2012) is among the studies that misuses Vuong’s test statistic. While the issue is detailed in the following section, Wilson (2015) explains that the confusion comes from a misunderstanding of what it means for a model to be “nested” – a requirement for the Vuong’s test.

This section first details the model selection process and count data methods used to estimate retail demand thresholds for non-employer establishments, employer establishments, and employment for 11 finely disaggregated retail industries across the contiguous United States. The second part of this section details the data used to develop unique industry specific location determinants, including time invariant place-based factors and restricted-access establishment and employment data.

The count data nature of establishments and employers suggests using a count data estimator over continuous linear alternatives. Count data estimators are superior when dealing with count data because they ensure that fitted values are nonnegative integers and do not require the conditional mean, E(y/x), to be linear in x. While continuous data models have been successfully used, as with Mushinski and Weiler's (2002) simultaneous Tobit model, the choice to deviate from count data estimators seems to be driven by other econometric needs, such as the inherent endogeneity in Mushinski and Weiler.

In selecting a count data estimator, it is helpful to consider how the industry landscape reflects a potential business owner’s location decision. For lower-ordered goods, we would expect to observe establishments located in most counties, with large central places containing more lower-ordered goods vendors to serve a larger population. The distribution for these lower-ordered goods may consequently resemble a standard Poisson distribution. Higher-ordered goods, however, may be more likely to benefit from economies of agglomeration (Henderson et al., 2000), and thus may have large clusters of establishments in large central places. The existence of economies of agglomeration in an industry could cause the distribution to appear skewed right, adding overdispersion into the distribution of establishments.

Overdispersion occurs when the dependent variable’s variance is larger than the mean and violates the assumption of equidispersion in Poisson data models, leading to a higher probability of committing a type one error in significance testing (Perumean-Chaney et al., 2013). In such cases the negative binomial distribution is more efficient as it allows for overdispersion by introducing unobserved individual heterogeneity into the Poisson’s conditional mean (Greene, 2012). This distinction between ordered goods and their distributions is not just important for efficient estimation, but also in accurately modeling the business location decision. As a result of the inherent overdispersion in industry establishment counts, lower-ordered establishments are more likely to be Poisson distributed while higher ordered industries are more likely to be negative binomial distributed due to their propensity to agglomerate.

While some retail types such as gas stations are present in nearly all United States counties, other types of retailers such as art dealers may only be present in a fraction of counties, leading to an excess of zeros. There are likely two different regimes within the zero generation processes for a particular industry: 1) structural zeros – places that lack some essential characteristic to support the industry, and 2) sampling zeros – places that meet minimum essential requirements but do not meet some other set of economic factors to support the industry. For example, a boat dealer would presumably locate somewhere near bodies of water, so we would expect to observe structural zeros in establishment count data in counties with no bodies of water. However, we may still observe sampling zeros in counties where other economic factors, such as population or income combined with random chance, are the preventative factors.

In the extant count data literature, this zero-generation process is captured by two mechanisms: hurdles and zero-inflation. Hurdle models break the location decision into two stages, where the first choice is whether to locate in a county, and the second choice is how many establishments to locate there. HP, conversely, does not have two zero generating regimes because the second stage is a truncated at zero count distribution. Thus, Zero-inflated models, such as the ZIP and the Zero Inflated Negative Binomial (ZINB), are more flexible than hurdle models as they account for both structural zeros as well as sampling zeros. Within the ZIP model, the data generation process is defined first by a binary distribution that identifies if the outcome is a structural zero, followed by a Poisson (or NB) process where zero is still a possible outcome. Following (Henderson et al., 2000), the log-likelihood function may be written as:

where S is a set of observations where y_i= 0, F is the logit link function, Z is a vector containing covariates in the participation decision, and X is a vector containing covariates in the amount decision. As a note, the last part of the second term is simply the standard Poisson model. By employing the ZIP and ZINB to model retail establishment county frequencies and comparing them to their non-zero inflated conventional counterparts, we are able to test and account for overdispersion from the two zero generating data regimes as well as from an industry’s tendency to agglomerate.

Our method for choosing amongst the four identified estimators (Poisson, negative binomial, ZIP, and ZINB) follows Perumean-Chaney et al. (2013) by first testing for overdispersion (Poisson versus negative binomial), followed by testing for zero inflation in the resulting count model. While the test for overdispersion consists of a simple likelihood ratio test on the alpha overdispersion parameter, testing for zero inflation is more involved. Previous studies (e.g. Chakraborty, 2012; Perumean-Chaney et al., 2013; etc.) have used Vuong’s statistic to test for zero inflation, but Wilson (2015) demonstrates that this method is incorrect. When Vuong (1989) presented the test for “non-nested” models, he presented six assumptions, one of which was that “nesting must not occur at a boundary of the parameter space of the larger model” (Wilson, 2015). While zero inflation models easily collapse down to their simpler count data counterparts when the zero inflation parameter equals zero, , this outcome is on the perimeter of the parameter space, leading to an unknown (non-normal) distribution for the test statistic (Wilson, 2015).[6] As a result, we identify zero inflation through visual inspections of dependent variable histograms as well as the Akaike and Bayesian information criteria (AIC and BIC respectively) in post-estimation (Greene, 1994). The AIC and BIC are relatively attractive measures for testing for zero inflation because they are not restricted to nested models. If an industry follows a zero inflated data generation process, we retest the overdispersion parameter again to ensure the overdispersion was not solely a product of the zero inflation process.

The data generation processes described above are likely to be dependent not only on the specific retail sector, but also on the industry size measure in the model. For example, non-employer and employer establishments may experience different benefits from agglomerating, leading one establishment count to resemble a Poisson distribution while the other might follow a negative binomial distribution. Regarding zero inflation, one could either view a non-employer establishment as a predecessor to an employer establishment, or as a more efficient means of delivering higher-ordered goods within smaller rural economies. Therefore, we may expect to find more excess zeros within employer establishment counts compared to non-employer establishment counts due to dissimilar economic opportunities across space and the establishment type’s role in Christaller’s (1966) functional hierarchy for a particular industry.

While the data generation process for employment likely also differs among retailers, we include this third measure primarily as a robustness check for employer establishment counts. Previous retail demand threshold studies focus on establishment counts, arguing that they represent a degree of consumer choice and availability in an area (Shonkwiler & Harris, 1996), but there is value in providing a measure of economic intensity (i.e. employment) that can be compared to establishment counts. Establishments of differing sizes are likely to provide differing consumer choices (e.g., seasonal ice cream stand versus full-service restaurant). Alternatively, the three measures of industry size in this analysis may also be thought of as portraying and improving the understanding of the industrial organization of different stages in a specific retail industry’s development, by 1) modeling the decision process for smaller (non-employer) establishments to locate in a place, 2) modeling what factors cause a larger (employer) establishment to locate in a place, and 3) modeling what factors cause employer establishments to grow (add employees) within a place.

A primary objective of this paper is to explore how the data generation processes of retail establishments may inform their hierarchical order. The zero-inflated models integral to this objective limit the use of spatial autocorrelations and spatially lagged covariates, thus we avoid adding spatial autocorrelation or spatially lagged covariates. As zero-inflated count-data spatial regression models become commonplace, this is an opportunity for future analysis. Instead, we address the spatial element through covariates that may identify these spatial relationships – namely, urban influence code (UIC) indicators and share of residents who work out of county, and, to some extent, location quotients. The UIC indicators recognize when a micropolitan county is neighboring a metropolitan county (both defined by population) while the share of commuting residents accounts for how economic dependence may be influenced by geographical barriers. These elements would not be captured through a simple spatial lag on neighboring population. Furthermore, these time invariant factors would not be captured by a panel model with fixed effects.

Data

Most retail demand threshold studies use the publicly available County Business Patterns (CBP) and Non-employer Statistics (NS) datasets for their analyses. However, despite noise infusion in the CBP and NS, data for numerous counties are suppressed or binned (e.g. 1-9 employees), both of which can lead to relatively large distortions in measuring smaller rural economies. Needless to say, these issues become more prevalent as an industry is disaggregated into its smaller component industries. While some industry count estimates are available from private vendors, the vendors do not disclose their estimation methods, and are of unknown accuracy. Anecdotal testimony from local economic development practitioners indicates some vendor-provided local employment estimates diverge widely from actual numbers.

We use the restricted-access establishment-level LBD and ILBD to circumvent these issues and provide unbiased demand threshold estimates for 11 retail industries, including eight at the most refined six-digit industry level (NAICS 44-45). While the LBD focuses on employer establishments and the ILBD focuses on non-employer establishments, the Census Bureau bases both annual data series on the Business Register and Internal Revenue Service tax records (Jarmin & Miranda, 2002). In addition to the data being more complete in scope than in prior works, the data also allow us to estimate the demand threshold for the number of employees within an industry. This alternative metric provides an intensity measure to compare with the simple existence of an establishment within a particular industry.

Census data privacy policy requires us to aggregate the data to the county-level. However, a county-level analysis also allows us to merge other important county-level data sources as well as make comparisons across the demand threshold literature, which tends to also be at the county-level. It should be reiterated that the county aggregated LBD is still superior to the CBP due to the LBD’s completeness, retiming of establishments, and nonsupression of employment counts.

Our choice of variables was informed not only by the literature, but also via virtual facilitated discsussions with rural retail service providers (see Loveridge, Nawyn, & Szmecko (2013) for a description of the method). Table I provides descriptive statistics for variables from secondary data sources as well as for the publicly available versions (CBP and NS) of these data. To avoid losing variation in the data from splitting the sample, rurality is addressed in the models through the inclusion of urban influence codes as dummy variables, the inclusion of population and population density, and the zero-inflation stage of the models when appropriate.

[Approximate position of Table I]

For ease of discussion, we organize the county-level covariates into three general categories: demographics and labor force, infrastructure and institutions, and the restricted-access establishment and employment data. Most of the demographic data are common in retail demand threshold models, however the two other data categories are relatively novel additions to the literature and warrant more discussion. As the set of relevant variables varies from industry to industry, the discussion here will be limited to general descriptions of the variable categories and how they relate to demand threshold theory.

Demographics and Labor Force: Population, race/ethnicity, age, unemployment and income measures are common in demand threshold models, however, our inclusion of social capital (Rupasingha et al., 2006), health insurance, and opiate overdoses is innovative. Support from community social networks enhances the likelihood of retail success and rural community sustainability (Frazier & Niehm, 2004; Korsching & Allen, 2004), as it allows for network development beyond physical boundaries of the community market setting. For instance, to become competitive, rural businesses exploit social networks to access important information pertaining to their local consumer market (Frazier & Niehm, 2004). Additionally, community and business development activities can only succeed if supported by a community with strong social networks that involve participation from local professionals, business owners, and community members (Sharp et al., 2002).

Observing Cleary et al.'s (2019) finding of lower demand thresholds for food hubs in areas with higher social capital, we expect social capital to lower demand thresholds via lower average costs, while labor inhibitors such as opiate overdoses increase demand thresholds due to higher labor costs. The opiate epidemic was a growing issue in 2014 and was mentioned several times throughout focus groups (2018) with retail stakeholders in the context of labor supply issues. Opiate prescription rate and health insurance act as controls for opiate deaths and other variables of interest.

Finally, we hypothesize the percent of workers who work outside their county of residence to be negative as it addresses the retail leakages and spatial interdependencies found in Mushinski & Weiler (2002). Referring to figure 1, retail leakages effectively shrink the market size, leading to lower demand for local retail. If a significant portion of workers commute to another county for work, this will likely lead to retail leakage for their county of residence.

Infrastructure and institutions: Median home value represents multiple place-based amenities and often increases in higher ordered places, reflecting the higher retail demand in amenity rich places or central places. Similarly, we expect to observe more internet service providers (ISP) in central places, however this measure may increase or decrease retail thresholds due to the countervailing effects of greater efficiency (lower AC) and market access (higher S) with competition from non-local ecommerce businesses (lower global price).[7] Still, evidence suggests that greater internet access may introduce businesses to other determinants of growth, such as greater social capital (Kharisma, 2022).

The combined state and average local sales tax rate is likely to increase demand thresholds due to higher costs of production for retailers, while we hypothesize average effective property tax rates to lead to lower demand thresholds. This hypothesis comes from evidence that manufacturers’ decisions to locate in a place are either not affected or are positively related to higher property tax rates because low property taxes frequently imply low-quality public services (Gabe & Bell, 2004; Reum & Harris, 2006). Glaeser, Kolko and Saiz (2001) argue the importance of multiple amenities for attracting and retaining workers in central places and similar arguments can likely be made for the hinterlands. Higher quality provision of public goods and services such as Main Street beautification projects, parking infrastructure, parks, and general city maintenance are likely to be drivers for retail sector establishments and employment.

The effects of other variables in this category are likely industry specific depending on how related that industry is to outdoor recreation, its reliance on mobile clientele (e.g. gas stations), and their relationships with large institutions, such as universities.

Restricted-access establishment and employment data: While we are unable to present the summary statistics for much of the data due to Census Bureau disclosure limitations, we used the LBD and ILBD to create employment location quotients for 11 two-digit NAICS sectors to account for Jacobs' (1969) between-sector economies of agglomeration. Although we do not directly test for the mechanisms through which Jacob’s between sector economies of agglomeration occur, a positive, significant coefficient will indicate evidence of between industry economies of agglomeration. We also include an establishment location quotient for retail industries outside of the industry being modeled to account for Marshall-Arrow-Romer’s within-sector agglomeration economies (E. L. Glaeser et al., 1992), both of which may lead to lower costs.[8] While future studies might show how different measures of agglomeration (e.g. the Ellison-Glaeser Index [Ellison et al., 2010]) and its sources influence retail demand thresholds, in providing the first attempt to account for the phenomenon, we opt for the simpler and well-known location quotient.

[6] The probability distribution for the ZIP can be written as: where γ is termed the “zero inflation parameter” and is bound between zero and one, 0 ≤ γ ≤ 1.

[7] We were unable to include ecommerce sales information from the Census’ Annual Retail Trade Survey due to incompatible units of analysis, however, we also tested maximum advertised download speed as an alternative measure. While some more detailed measures of ecommerce and broadband accessibility exist (e.g. Gallardo & Beaulieu, 2019), none are available for all counties in the contiguous U.S.

[8] Establishments were used for the other-retail location quotient because the number of establishments indicate a degree of consumer choice in the retail sector (Shonkwiler & Harris, 1996).

Table II presents highlights from the employer and non-employer demand establishment thresholds. The employment demand threshold models are available in the appendix. The primary goal of this analysis is to identify methods for finding opportunities (and, possibly, downsizing risks) for rural economies, thus we pay less attention to ensuring that all models contain the same covariates, and instead focus on ensuring that the models capture the economic and community attributes most important the retail industry. For example, some industry models, such as supermarkets, convenience stores, and liquor stores contain three additional policy variables regarding the unique alcohol sales laws in the county. While we retain an essential set of covariates regardless of their significance to control for important aspects identified in the literature and focus groups, we employ individual and group chi-square tests to inform the decision to include many of the place-based factors and institution variables. The industries presented in this section display the variety of data generation processes and importance of different covariates across different industries and establishment types.

Distribution and industrial organization

Negative binomial data generation processes were most common, occurring slightly more often amongst non-employer establishment types (table II). Overdispersion is the distinguishing feature of the negative binomial count data distribution, and in this context overdispersion is most common in industries whose establishments may increase proportionally with demand (perhaps due to a small spatial radius or the inability of non-employers to scale up) or benefit from clustering. Retail industries that follow this data generation process for both establishment types include automobile dealers, pharmacies, convenience stores, and gas stations. These industries, and particularly the latter three, all provide relatively essential goods to consumers, can operate with few employees, and are presumably not as scalable as some other industries, leading to overdispersion in the distribution of establishments. Hardware stores may be essential to local economies, but may also experience economies of scale as they grow in size/employment, limiting the number of individual establishments in any one county, and resulting Poisson distributed employer hardware stores. These findings support the hypothesis that lower-ordered goods tend to not follow zero inflated data generation processes due to their essential nature.

Zero inflation was present nearly equally across the two establishment types. Liquor stores, sporting goods stores, and art dealers follow a zero inflated distribution regardless of establishment type, presumably due to their nonessential nature or laws that restrict their viability in a place. Bookstores and clothing stores are NB distributed for non-employers but ZINB distributed for employer establishments, indicating that they may play a role in most local economies but a smaller set of markets can support employer establishments. Supermarkets follow the opposite pattern where most counties have an employer establishment supermarket, but a smaller set of counties have non-employer supermarkets.[9] This indicates that the hierarchical level of non-employer establishments may differ depending on the industry. Bookstore non-employers may be more efficient than employers in lower level places, while supermarket non-employers may represent specialty grocers in higher level places.

[Approximate position of Table II]

Local and regional factors

One long held critique of demand threshold models is endogeneity. This issue is particularly difficult to address without readily accessible panel data estimators that can handle zero inflated count data distributions. Thus, it is more correct to interpret the covariate marginal effects as an association between community capitals and the establishment type rather than factors that directly lead to the occurrence of a particular establishment type. We attempt to control for endogeneity by taking multiple year averages of some variables (e.g. unemployment) and mitigate biases by observing model coefficients with and without potentially endogenous variables, though we emphasize the non-causal interpretation of the results. All models make use of White’s robust standard errors, and variance inflation factors did not indicate any potential concern over loss of efficiency from multicollinearity.

Coefficients in the inflation stage of the zero-inflated models represent the increase in probability of a county being in the certainly zero category, so a negative coefficient for population indicates that, the higher the population, the less likely the county will have zero establishments of the relevant type. Population is the most often significant variable in the inflation stage followed by tax rates and urbanicity. Amongst significant results, more population always leads to a decrease in the probability of the county being in the certainly zero category. While significant sales tax rate coefficients display mixed signs, higher property tax rates always lead to a lower probability of zero establishments, supporting previous findings in the literature that higher public amenities lead to higher retail demand. Retail industries that are more likely to have zero establishments due to locating in a metro county include sporting goods stores, employer bookstores, and employer clothing stores—model outputs that must be considered in tandem with population coefficients of the opposite sign. Nonetheless, the finding is somewhat consistent with Schuetz (2015) who found that big box retailers prefer areas with lower population density.

Amongst the labor force and taxes variables, the social capital index has the most ubiquitous positive effect on establishment counts, particularly amongst employer establishments. Employer clothing stores, supermarkets, and automobile dealers benefit the most from high social capital levels lowering the costs of business. Local ISP counts are less significant in the models than expected but seemed to have the greatest positive influence on employer automobile dealers and non-employer bookstores, both of which may rely heavily on online shopping or advertising. This muddled result is likely due to the countervailing effects of greater market access and increased competition. When interpreting the percent of opiate deaths in a county, it is important to note that the models control for opiate prescription rates. Industries whose establishment counts are negatively associated with opiate deaths, perhaps due to higher labor costs, include sporting goods stores and non-employer bookstores. The employment thresholds in the appendix also suggest significantly lower clothing and sporting goods employment in counties with higher fatal overdose rates. Industries in the establishment demand threshold models with positive opiate death share coefficients likely play an active role in the issue – such as pharmacies and supermarkets (which often contain pharmacies in the U.S.) – or are simply a common establishment type in particularly affected areas, which might, for example, have higher rates of tobacco or alcohol use. Instrumental variables or panel data analysis should be used to untangle the causality issue concerning the demand threshold impact of the opiate epidemic.

It is more sensible to develop causal arguments for relatively more time invariant place-based factors. For example, employer clothing store retailers are most affected by retail leakages, on average losing nearly four and a half establishments for every 10% increase in residents who work out of county. It appears that retail leakages either weaken employer establishments to a greater degree, or perhaps that retail leakages prevent non-employer establishments from ever growing into employer businesses due to limited demand for local retail. Looking at urban influence, clothing retailers suffer the most from retail leakages in metro adjacent or metro areas followed by employer auto dealers, supermarkets, and gas stations.

Sporting goods retailers display the strongest relationships with natural assets, and increase in number with larger areas of public land and water cover where demand is relatively higher. The positive coefficient for the travel time to a National Park Service (NPS) asset may seem unexpected for sporting goods stores, so it bears some discussion. The NPS usually imposes strong limitations on hunting, fishing, and off-roading within NPS assets, perhaps representing a substitute to other types of recreation supported by sporting goods stores. Supermarkets and hardware stores also increase with water cover, perhaps due to larger market sizes resulting from tourism and residential attractiveness of destinations with bodies of water, higher concentration of residential housing, and the higher maintenance needed for buildings near large bodies of water.

Evidence of economies of agglomeration with the other 10 sectors are present across each of the retail industries, but for brevity we only focus on the location quotient for other retailers, or retailers outside of the relevant industry. Clothing stores and automobile dealers benefit the most from locating in a retail cluster, with clothing stores observing an increase of ten stores with every one unit increase in the location quotient.

Eight of the eleven retail industries exhibited smaller population thresholds for non-employer establishments compared to employer establishments. This along with the different data distributions and other coefficient estimates provides evidence that non-employer establishments are fundamentally different from employer establishments. Areas with more austere market conditions may still be conducive to non-employer establishments, allowing a lower tiered place to have a higher-order good or service, or even a high tiered place to have a unique retailer offering highly specialized goods or services. Thus, these results expand on Christaller’s (1966) functional hierarchy by not only defining places based on the types of goods and services offered, but also the types of establishments offering those goods and services.

Regarding the policy variables, liquor stores have both fewer establishments and lower employment in counties and states allowing the sale of beer or liquor in grocery stores, due to greater market competition. While convenience stores locate in greater numbers where grocery stores are legally able to sell beer, supermarkets curiously decrease in number. However, the employment models presented in the appendix suggest that, although there are fewer supermarkets in freer markets, they tend to be larger. Future research should determine and explain a causal relationship between alcohol retail policies and their influence on supermarket frequency and size.

Oregon and New Jersey prohibit consumers from pumping their own gas, and it appears that while this policy might generate more employment, it decreases the number of employer gas stations in these states by increasing average costs. Non-employer gas stations in these states may be family operated establishments that are open for limited hours, or perhaps fleet or contract gas stations that may be exempted from the law.

[9] Non-employer supermarkets include family grocers and other supermarket types that may not formally pay their employees.

Figure 2 highlights the differences in demand threshold contours when using the coefficients estimated from the external public CBP data and the internal restricted-access FSRDC data. We first re-estimated the demand threshold models using the CBP data, resulting in two sets of coefficients: those containing bias from the suppressed CBP data, and those without bias from the unsuppressed LBD data (i.e. the results presented in table 2). We then predicted the out of sample expected establishment levels for six different population levels for a rural county (reference group) while holding all other covariates at their median values. [10] This estimation of demand threshold contours differs from previous studies’ simpler and more restrictive specifications which tend to only use the marginal effect of population as an approximation of the demand threshold contour.

[Approximate Position of Figure 2]

Figure 2 illustrates the functional hierarchy of retail businesses. Out of the three industries graphed, gas stations arguably represent the lowest ordered service and exhibit their fundamental role to rural economies through a relatively greater number of establishments, or a lower demand threshold compared to clothing stores and bookstores. While clothing is also a necessity, it is purchased with less frequency, leading to relatively higher demand thresholds for clothing stores. Bookstores are lower order retailers and likely cluster in higher-tiered places, thus these establishments represent the highest demand threshold for rural counties.

While it may appear that publicly suppressed establishment counts from the CBP data generally lead to overestimating the number of retail establishments, or underestimating the demand threshold (i.e. necessary population) for a particular number of establishments, the bias behaves differently depending on the industry. The direction of the bias is not consistent. Although the percent difference between public and restricted model establishment predictions tends to decrease as population increases, the percent difference stays constant for supermarkets, and even increases for pharmacies. These nuances in percent difference biases likely stem from lower-tiered places (i.e. rural counties) being largely undisclosed in the public data, causing the demand thresholds for lower-ordered services to be overstated and higher-ordered services to be understated.

The expected frequencies of gas station and bookstore establishments are slightly concave over population, as hypothesized, although visually, all of the contours appear linear. This visual linearity may be due to the scale of the graph, however, the expected count of clothing stores appears to be slightly convex. This suggests that the growth functions across retail establishments may be more nuanced than the previously held idea that demand threshold contours are concave. Some rural retail stores, such as clothing stores, may experience a population range where the benefits of agglomeration economies outweigh the costs of increased competition causing an increasing establishment growth rate as population increases.

[10] Holding all other covariates at their medians ensures that dummy variables take on whole values. Since the focus of demand threshold estimates is often on rural economic development, we calculated our predictions using the median values for rural counties only.

The retail sector is vitally important to the United States’ economy, as well as to local economies. Retail has become particularly germane to rural communities where it is now the third largest employer, partially due to the consolidation of agricultural businesses (Laughlin, 2016). Demand threshold models and CPT are commonly used to reveal which locational determinants communities may leverage to attract new retail industries or establishments, but data nondisclosure issues limit the insights from these models. By using microdata to accurately estimate the location decisions of retail establishments we not only provide greater understanding of specific retail industry’s location determinants, but we also flesh out CPT by recognizing the role of non-employers in the functional hierarchy of places.

We explore the location decision implications of retail industries through observing employer and non-employer establishment distributions and influential economic and locational factors. We estimate demand threshold models for 11 retail industries using county aggregated restricted-access data from the LBD and ILBD. These rich datasets allow us to observe every establishment in the contiguous United States, generating new insights into demand thresholds across ruralities, industries, and employer/non-employer establishment types. To account for overdispersion and zero-inflation within the data we test across several count data models for each of the industries and establishment types.

Three data generation processes were identified amongst the 11 retail industries presented here, with negative binomial and zero-inflated negative binomial being the most common. Industries and establishment types displaying overdispersion either increase somewhat proportionally with population or benefit from clustering together, and may have a small spatial radius or an inability to scale up individual establishments. Examples of these industries include gas stations, pharmacies, and convenience stores. Industries and establishment types exhibiting zero-inflation, such as sporting goods stores and art dealers, likely face some barriers that prevent them from locating in every county. Such barriers may include large spatial radii, a certain population size, or other industry-specific requirements.

Several interesting narratives surrounding retail leakages, industry interdependencies, and influential policy variables or locational factors emerge, and future research should seek to identify causal relationships between the covariates and establishment counts. As we noted in-text, causal arguments become more tenable for relatively more time-invariant place-based factors.

Future research may seek to use natural experiments surrounding national park (or land, monument, etc.) declarations or infrastructure investments to dig into those relatively long-term casual effects on demand thresholds. Given the industry specific data generation processes revealed here, future research could benefit most from developing panel compatible versions of zero-inflation models and exploring how the data generation processes vary across time within industries in different sectors of the economy. Although a fixed effects panel analysis could not include time invariant place-based factors, it could be used to explore lower levels of geography where the data on place-based factors may not be as readily available. Other potential improvements to the models include different measures of interindustry linkages and ecommerce, a detailed analysis into the measurement error bias of the CBP compared to the LBD and ILBD, as well as accounting for spatial interdependencies between counties.

LBD

Longitudinal Business Database

ILBD

Integrated Longitudinal Business Database

CBP

County Business Patterns

CPT

Central Place Theory

FSRDC

Federal Statistical Research Data Center

NPS

National Parks Service

U.S.

United States

ISP

Internet Service Provider

Negative Binomial

ZINB

Zero Inflated Negative Binomial

NAICS

North American Industry Classification System

Non-employer Statistics

ZIP

Zero Inflated Poisson

UIC

Urban Influence Code

AIC

Akaike Information Criterion

BIC

Bayesian Information Criterion

Hurdle Poisson

Average Costs

Availability of data and materials

While the data is restricted-access and not publicly available, we are able to share the data and statistical code with anyone who holds Special Sworn Status with the U.S. Census Bureau.

Competing interests

The authors have no competing interest to disclose.

Funding

This project was supported by the Agricultural and Food Research Initiative Competitive Program of the USDA National Institute of Food and Agriculture (NIFA), award number 2017-67023-26242, and by the USDA National Institute of Food and Agriculture, Hatch project 1014691. Any opinions and conclusions expressed herein are those of the author and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed.

Authors' contributions

Anders Van Sandt: methodology, formal analysis, data curation, writing -original draft preparation, writing – review and editing,

Craig Carpenter: methodology, data curation, writing – review and editing, secured funding

Rebekka Dudensing: methodology, writing – review and editing,

Scott Loveridge: methodology, formal analysis, writing – review and editing,

Linda Niehm: methodology, writing – review and editing,

Acknowledgements

We have no acknowledgements to make.

Berry, B. J. L., & Garrison, W. L. (1958a). Recent Developments of Central Place Theory. Papers in Regional Science, 4(1), 107–120. https://doi.org/10.1111/j.1435-5597.1958.tb01625.x
Berry, B. J. L., & Garrison, W. L. (1958b). A Note on Central Place Theory and the Range of a Good. Economic Geography, 34(4), 304. https://doi.org/10.2307/142348
Bresnahan, T. F., & Reiss, P. C. (1991). Entry and Competition in Concentrated Markets. Journal of Political Economy, 99(5), 977–1009. https://doi.org/10.1086/261786
Chakraborty, K. (2012). Estimation of Minimum Market Threshold for Retail Commercial Sectors. International Advances in Economic Research, 18(3), 271–286. https://doi.org/10.1007/s11294-012-9354-3
Christaller, W. (1933). Central Places in southern Germany. In Die Zentralen Orte in Süddeutschland. Prentice Hall.
Cleary, R., Goetz, S. J., Mcfadden, D. T., & Ge, H. (2019). Excess Competition among Food Hubs. Journal of Agricultural and Resource Economics, 44(1), 141–163.
Deller, S. C., & Harris, T. R. (1993). Estimation of minimum market thresholds using stochastic frontier estimators. Regional Science Perspectives, 23(1), 3–17.
Desmet, K., & Fafchamps, M. (2005). Changes in the Spatial Concentration of Employment across U.S. Counties: A Sectoral Analysis 1972-2000. Journal of Economic Geography, 5(3), 261–284.
Ellison, G., Glaeser, E. L., & Kerr, W. R. (2010). What Causes Industry Agglomeration? Evidence from Coagglomeration Patterns. The American Economic Review, 100(6), 1195–1213.
ERS. (2017). Rural America at a Glance. https://www.ers.usda.gov/webdocs/publications/85740/eib-182.pdf?v=0
Frazier, B. J., & Niehm, L. S. (2004). Exploring Business information Networks of Small Retailers in Rural Communities. Journal of Developmental Entrepreneurship, 9(1), 23–42.
Gabe, T. M., & Bell, K. P. (2004). Tradeoffs between local taxes and government spending as determinants of business location. Journal of Regional Science2, 44(1), 21–41.
Gallardo, R., & Beaulieu, L. J. (2019). Broadband Data Validation and Demand Aggregation in Indiana.
Glaeser, E., Kolko, J., & Saiz, A. (2001). Consumer City. Journal of Economic Geography, 1(1), 27–50.
Glaeser, E. L., Kallal, H. D., Scheinkman, J. A., & Shleifer, A. (1992). Growth in Cities. Journal of Political Economy, 100(6), 1126–1152.
Greene, W. H. (1994). Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models. NYU Working Paper No. EC-94-10.
Greene, W. H. (2012). Econometric Analysis (7th ed.). Prentice Hall.
Hammock, R. (2019). U.S. Chamber report focuses on rural business potential. https://smallbusiness.com/rural/small-business/
Harris, T. R., & Shonkwiler, J. S. (1997). Interdependence of Retail Businesses. Growth and Change, 28(4), 520–533. https://doi.org/10.1111/1468-2257.00070
He, W., Wang, F., Chen, Y., & Zha, S. (2017). An exploratory investigation of social media adoption by small businesses. Information Technology and Management, 18(2), 149–160.
Henderson, J. W., Kelly, T. M., & Taylor, B. A. (2000). The Impact of Agglomeration Economies on Estimated Demand Thresholds: An Extension of Wensley and Stabler. Journal of Regional Science, 40(4), 719–733. https://doi.org/10.1111/0022-4146.00195
Jacobs, J. (1969). The Economy of Cities. Random House.
Jarmin, R. S., & Miranda, J. (2002). The Longitudinal Business Database (No. 02–17; Center for Economic Studies Working Paper Series).
Kharisma, B. (2022). Surfing alone? The Internet and social capital: evidence from Indonesia. Journal of Economic Structures, 11(8), https://doi.org/10.1186/s40008-022-00267-7
Korsching, P. F., & Allen, J. C. (2004). Locality Based Entrepreneurship: A Strategy for Community Economic Vitality. Community Development Journal, 39(4), 385–400.
Krugman, P. (1991). Increasing Returns and Economic Geography. Journal of Political Economy, 99(3), 483–499. http://www.journals.uchicago.edu/t-and-c
Krugman, P. (2010). The new economic geography, now middle aged. http://www.princeton.edu/~pkrugman/aag.pdf.
Laughlin, L. (2016). Beyond the farm: Rural industry workers in America. In Census Blog Posts. https://www.census.gov/newsroom/blogs/random-samplings/2016/12/beyond_the_farm_rur.html
Lösch, A. (1940). The economics of location (W. H. Woglom & W. F. Stolper (Eds.)). Yale University Press.
Loveridge, S., Nawyn, S., & Szmecko, L. (2013). Conducting Virtual Facilitated Discussions. Community Development Practice, 19. http://www.comm-dev.org/images/pdf/Conducting-virtual-facilitated-dicussions template-new 1.pdf
Memili, E., Fang, H., Chrisman, J., & De Massis, A. (2015). The impact of small-and medium-sized family firms on economic growth. Small Business Economics, 45(4), 771–785.
Mulligan, G. F., Partridge, M. D., & Carruthers, J. I. (2012). Central place theory and its reemergence in regional science. The Annals of Regional Science, 48(2), 405–431. https://doi.org/10.1007/s00168-011-0496-7
Mushinski, D., & Weiler, S. S. (2002). A Note on the Geographic Interdependencies of Retail Market Areas. Journal of Regional Science, 42(1), 75–86. https://doi.org/10.1111/1467-9787.00250
Parr, J. B., & Denike, K. G. (2016). Theoretical Problems in Central Place Analysis. Economic Geography, 46(4), 568–586.
Partridge, M. D., Rickman, D. S., Ali, K., & Olfert, M. R. (2008). Lost In Space: Population Growth In the American Hinterlands and Small Cities. Journal of Economic Geography, 8(6), 727–757.
Pennerstorfer, A., & Pennerstorfer, D. (2019). How small are small markets? Local market size for child care services. Regional Science and Urban Economics2, 77, 340–355. https://doi.org/10.1016/j.regsciurbeco.2019.06.006
Perumean-Chaney, S. E., Morgan, C., McDowall, D., & Aban, I. (2013). Zero-inflated and overdispersed: what’s one to do? Journal of Statistical Computation and Simulation, 83(9), 1671–1683.
Reum, A. D., & Harris, T. R. (2006). Exploring Firm Location Beyond Simple Growth Models: A Double Hurdle Application. 36(1), 45–67.
Ring, J. K., Peredo, A. M., & Chrisman, J. J. (2010). Business Networks and Economic Development in Rural Communities in the United States. Entrepreneurship Theory and Practice, 34(1), 171–195. https://doi.org/10.1111/j.1540-6520.2009.00307.x
Rupasingha, A., Goetz, S. J., & Freshwater, D. (2006). The Production of Social Capital in US Counties. Journal of Socio-Economics, 35, 83–101. https://doi.org/doi:10.1016/j.socec.2005.11.001
Schuetz, J. (2015). Why are Walmart and Target Next-Door Neighbors? Regional Science and Urban Economcis, 54, 38–48.
Shaffer, R., Deller, S., & Marcouiller, D. (2004). Community economic devleopment: Linking theory and practice (2nd ed.). Blackwell Publishing Ltd.
Sharp, J. S., Agnitsch, K., Ryan, V., & Flora, J. (2002). Social Infrastructure and Community Economic Development Strategies. Journal of Rural Studies, 18(4), 405–417.
Shonkwiler, J. S., & Harris, T. R. (1996). Rural retail business thresholds and interdependencies. Journal of Regional Science, 36(4), 617–630. https://doi.org/10.1111/j.1467-9787.1996.tb01121.x
Small Business Credit Survey: Report on Rural Employer Firms. (2017).
Thiede, B., Greiman, L., Weiler, S., Beda, S. C., & Conroy, T. (2017). Six charts that illustrate the divide between rural and urban America. In The Conversation. http://ruraljobscoalition.com/clientuploads/toolkit/Six charts that illustrate the divide between rural and urban America.pdf
Thilmany, D., Mckenney, N., Mushinski, D., & Weiler, S. (2005). Beggar-thy-neighbor economic development : A note on the effect of geographic interdependencies in rural retail markets. Annals of Regional Science, 39, 593–605. https://doi.org/10.1007/s00168-005-0229-x
U.S. Census Bureau. (2012). Survey of Business Owners [dataset]. https://www.census.gov/library/publications/2012/econ/2012-sbo.html
Vuong, Q. H. (1989). Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica, 57(2), 307. https://doi.org/10.2307/1912557
Wensley, M. R. D., & Stabler, J. C. (1998). Demand-Threshold Estimation for Business Activities in Rural Saskatchewan. Journal of Regional Science, 38(1), 155–177. https://doi.org/10.1111/0022-4146.00086
Wilson, P. (2015). The misuse of the Vuong test for non-nested models to test for zero-inflation. Economics Letters, 127, 51–53. https://doi.org/10.1016/J.ECONLET.2014.12.029

Table I. Secondary Data Descriptive Statistics
Variable	Mean	Std. Dev.	Source
Demographics
Population	101,931.30	327,468.00	ACS
Population Density	0.27	1.80	ACS
Median Age	40.85	5.18	ACS
Percent of Seniors (Age ≥ 65)	0.18	0.44	ACS
Per Capita Income	39.59	11.64	BEA
Percent of Residents in Poverty	16.84	6.55	ACS
Unemployment Rate (5 year avg.)	7.89	2.68	BLS
Social Capital Index	0.01	1.26	NERCRD – Penn. State University
Percent with Health Insurance	0.79	0.08	ACS
Opiates Prescribed / 100 People	85.72	49.37	CDC
Percent of Opiate Related Deaths	0.26	0.85	CDC
Percent Work in Another County^†	30.10	17.69	ACS
Percent Work from Home^†	4.75	3.24	ACS
Percent White	0.84	0.16	ACS ^§±
Percent Black	0.09	0.15	ACS
Percent Asian	0.01	0.02	ACS
Percent Hispanic	0.09	0.14	ACS
Percent - Below High School	14.92	6.81	ACS
Percent - Only High School	34.78	7.10	ACS
Percent – Bachelors or Above	13.24	5.48	ACS
Infrastructure and Institutions
Median Home Value ($1,000’s)	135.77	79.20	ACS
Avg. Combined Sales Tax Rate^§	7.01	1.68	Tax Foundation
Avg. Effective Property Tax Rate	1.06	0.51	SmartAsset.com
Internet Service Providers (ISPs)	5.19	1.12	FCC
Metro - Urban Influence Code	0.37	0.48	ERS, USDA
Micropolitan Metro Adjacent - UIC	0.33	0.47	ERS, USDA
Micropolitan Non-metro Adjacent - UIC	0.30	0.46	ERS, USDA
Interstate Density^±	1.79	3.07	ArcGIS – Census Shapefiles
Highway Density^±	0.37	0.24	ArcGIS – Census Shapefiles
Natural Amenities Index	3.49	1.04	ERS, USDA
Hours to National Park Service Asset	1.64	0.93	GIS team at ERS, USDA
Percent Covered by Public Land	9.82	20.64	FS, USDA
Percent Covered by Water	4.50	11.15	ERS, USDA
Percent Covered by Native American Reservation	1.30	8.31	US Census Bureau
Community Colleges	0.33	0.84	NCES
Universities or Colleges	0.72	2.40	NCES
Military Bases	0.04	0.22	US Census Bureau
Grocery Beer	0.76	0.43	N/A
Grocery Liquor	0.37	0.48	N/A
Restricted Alcohol Sales	0.23	0.42	N/A
No Gas Pumping Law	0.02	0.13	N/A
Census Region Fixed Effects	N/A	N/A	US Census Bureau
† Percent of all residents age 16 and older § State-level variable ± Miles of road per hundred square miles ACS: American Community Survey, BLS: Bureau of Labor Statistics, BEA: Bureau of Economic Analysis, CDC: Center for Disease Control and Prevention, FCC: Federal Communications Commission, ERS: Economic Research Service, USDA: US Department of Agriculture, FS: Forest Service, NCES: National Center for Education Statistics Not shown: Rural leakage – Interaction between Percent Work in Another County & Micropolitan Non-Metro Adjacent Rural Military Bases – Interaction between Military Bases & Micropolitan Non-Metro Adjacent

Table II. Selected Marginal Effects of Retail Demand Threshold Models – Establishments
Retail Industry	Automobile Dealers		Hardware Stores		Pharmacies
Distribution	NB	NB	Poisson	NB	NB	NB
Variable	Employers	Non-emp’	Employers	Non-emp’	Employers	Non-emp’
Ln(population)	15.79***	39.38***	4.03***	1.326***	13.05***	2.549***
Population density	-44.44***	2.956	8.766***	1.501	16.61***	4.266*
Per capita income	-0.3116	15.99***	0.5651	0.0772	4.295***	0.0155***
Poverty (%)	-0.0815*	0.5633***	0.0051	0.0102	0.1791***	0.0358**
Median home value	68.76*	-275.3***	-6.9	8.665	15.44	-1995*
High school (%)	-0.0037	-0.1332***	-0.0147**	-0.0043	-0.0307***	-0.0109*
Bachelors (%)	-0.0561	-0.0776	-0.0071	-0.027**	0.061	0.0503**
Metro adjacent (UIC)	-1.726***	-2.258*	0.1885	-0.2565*	-0.6737**	-0.0808
Metro (UIC)	-2.814***	-3.776***	-0.3901**	-0.2795**	-1.096***	-0.3793*
Work out of county (%)	-0.0376***	0.0942***	-0.0242***	0.0021	0.0035	0.0049
Work from home (%)	0.026	0.4618***	0.0608**	0.0693***	-0.1156**	0.0557**
LQ – Other retail	1.814***	4.018***	-0.0646	0.0674	0.3226**	0.2571***
Sales tax rate	-0.2141***	0.4504**	-0.0182	0.0791***	0.1818***	0.0797**
Property tax rate	0.0447	-3.138***	0.057	-0.1426	-0.1247	0.0169
ISP count	0.2528*	-0.0187	-0.0649	-0.0677*	0.0266	-0.0941*
Social capital index	1.261***	0.7409*	0.4042***	-0.0023	0.5894**	0.032
Opiate Fatalities (%)	-0.2746	0.3312	0.0394	0.0999***	0.8628***	0.2482***
Community colleges	0.2666**	0.4082
Highway Density	-1.822	7.131**
(Highway Density)²	0.682	-2.96**
Water cover (%)			0.0158***	-0.0002
Seniors (%)					3.977***	0.5774
Significance levels: *<1%, <5%, *<10%

Table II. (cont.) Selected Marginal Effects of Retail Demand Threshold Models – Establishments
Retail Industry	Supermarkets		Convenience Stores		Liquor Stores
Distribution	NB	ZINB	NB	NB	ZINB	ZINB
Variable	Employers	Non-emp’	Employers	Non-emp’	Employers	Non-emp’
Ln(population)	17.59***	0.6729***	9.869***	1.349***	9.464***	3.718***
Population density	49.03**	6.315	-10.23	-0.1013	-20.63	-5.118*
Per capita income	-3.956*	-0.0003	-2.392	-0.1813	1.473	-1.204
Poverty (%)	0.1806***	0.0184**	-0.0223	0.0067	-0.0938*	-0.0136
Median home value	116.9***	11.64*	179.5***	6.752	212.4***	46.18***
High school (%)	-0.0106	0.0048	-0.0235	-0.005	0.0019	0.0073
Bachelors (%)	0.0609	0.0075	-0.2004***	0.0052	0.1245***	0.0795***
Metro adjacent (UIC)	-1.426***	0.0065	1.961***	-0.0954	-0.6459	-0.2981
Metro (UIC)	-2.27***	-0.2354	1.414**	-0.0102	-0.1415	-0.94
Work out of county (%)	-0.0663***	-0.0018	0.0138	0.0061**	-0.0309***	-0.0033
Work from home (%)	0.4657***	0.0161	-0.1036	-0.0191	0.0219	0.0115
LQ – Other retail	-0.6722***	-0.0419	-0.9918***	-0.1271**	-0.4626*	0.1249
Sales tax rate	0.4104***	-0.0072	-0.8057***	-0.0339	-0.0355	0.1313**
Property tax rate	1.519***	0.0257	3.004***	0.1409*	-0.5342*	-0.3721**
ISP count	-0.0143	0.0301	-0.2834	0.009	0.0735	-0.0465
Social capital index	1.541**	-0.0396	0.4589	0.0193	1.232**	0.0356
Opiate Fatalities (%)	1.366***	0.051**	0.5489***	0.0661**	0.5598*	0.184**
Community colleges	-0.4973**	-0.0738***	-0.0741	-0.0681**
Universities/colleges	0.0527	0.02***	-0.0657	0.0095	0.0164	0.0494**
Highway Density					4.088***	0.891
(Highway Density)²					-1.569***	-0.1215
Water cover (%)	0.0455***	0.0021
Grocery Beer	-2.106***	-0.0779	1.451***	-0.1092	-3.978***	-1.938***
Grocery Liquor	0.665	-0.322***	-0.1531	-0.0617	-4.161***	-0.6341**
Restricted Sales	-2.422***	-0.2466	-1.643***	0.0109	2.453***	-1.049***
Inflation Stage
Ln(population)		0.0474			0.0023	-0.0098
Population Density		-33.01*			-1.307	-0.9013
Metro adjacent (UIC)		0.0904			-0.0241	0.017
Metro (UIC)		-0.1114			0.0192*	0.0042
Sales tax rate		0.0911***			-0.0059***	-0.0038
Property tax rate		-0.0446			-0.0903	-0.0152
Per capita income		-0.0551			-0.0977	-0.3067***
Median home value		-25.29			-8.036***	0.3668
Grocery Beer					-0.0322***	-0.0039
Grocery Liquor					0.0112**	-0.1699*
Restricted Sales					0.0173
Significance levels: *<1%, <5%, *<10%

Table II. (cont.) Selected Marginal Effects of Retail Demand Threshold Models – Establishments
Retail Industry	Gas Stations		Clothing Stores		Sporting Goods Stores
Distribution	NB	NB	ZINB	NB	ZINB	ZINB
Variable	Employers	Non-emp’	Employers	Non-emp’	Employers	Non-emp’
Ln(population)	28.93***	3.297***	48.8***	41.93***	7.407***	6.567***
Population density	-95.69***	0.6	-47.51*	57.32***	43.36	270.1*
Per capita income	6.742***	0.0183**	9.084	0.1166***	-1.685	0.0046
Poverty (%)	0.103*	0.0498**	0.1424	-0.0556	-0.0585*	-0.0494**
Median home value	-133.8**	-4205***	1047***	16080**	82.35***	-4146**
High school (%)	-0.0135	-0.0113	-0.2658***	-0.0761**	0.0083	0.0134
Bachelors (%)	-0.0258	0.032	2.305***	0.8879***	0.3209***	-0.0185
Metro adjacent (UIC)	-1.004*	-0.2019	-7.151***	-4.032***	-0.7424**	-0.2964
Metro (UIC)	-2.577***	-0.7934***	-8.727***	-5.193***	-0.9824***	-1.193***
Work out of county (%)	-0.1215***	-0.0068	-0.4414***	0.0227	-0.0667***	-0.0181***
Work from home (%)	0.0006	0.0575	-0.287	0.333*	-0.0564	0.1268**
LQ – Other retail	0.5514*	0.2016**	9.879***	2.631***
Sales tax rate	0.436***	0.0918	-0.1595	0.692***	-0.1051**	-0.1047**
Property tax rate	-0.0238	-0.3516*	0.6899	-0.6422	0.4318**	0.1105
ISP count	-0.2072	-0.0208	-0.1663	0.4051	-0.0135	-0.1503**
Social capital index	1.134*	0.0125	2.759**	0.0975	0.5402*	0.2661**
Opiate deaths (%)	0.587	0.376***	-1.063	1.951***	-0.2952***	-0.1788**
Community colleges	0.5223**	-0.0499	0.1942	-0.5222	0.1068	0.2353***
Interstate Density	0.3976***	-0.0743**	0.6142**	0.1604
(Interstate Density)²	-0.004	0.0015*	-0.0013	0.0016
Public land (%)			-0.0774*	0.0399*	0.0187***	-0.0002
Hrs. to National Park					0.3332***	0.2145**
Water cover (%)					0.0167***	-0.0025
No pumping law	-3.093**	1.008**
Inflation Stage
Ln(population)			-0.0868***		-0.0521**	-0.0243*
Population Density			-15.21		-91.82	-241.9*
Metro adjacent (UIC)			0.007		0.0058	-0.0246
Metro (UIC)			0.0652**		0.0909**	0.0656***
Sales tax			0.006*		0.0376***	0.0029
Property tax			-0.0514*		-0.0156	-0.0331*
Per capita income			-0.3592*		-0.0504	0.0002
Median home value			-7.632*		-7.501	-372.6
Significance levels: *<1%, <5%, *<10%

Table II. (cont.) Selected Marginal Effects of Retail Demand Threshold Models – Establishments
Industry	Bookstores		Art Dealers
Distribution	ZINB	NB	ZINB	ZINB
Variable	Employers	Non-emp’	Employers	Non-emp’
Ln(population)	3.058***	5.707***	1.46***	5.581***
Population density	93.79***	1.84	8.248	8.591
Per capita income	-0.8246*	-0.017**	-0.1181	-0.0196**
Poverty (%)	-0.0235	0.0156	-0.0197	-0.0514*
Median home value	49.49***	-678.3	35.17***	5948**
High school (%)	-0.0351***	-0.0155*	-0.0133*	-0.0202*
Bachelors (%)	0.0567***	0.0364	0.0811***	0.1942***
Metro adjacent (UIC)	-0.6798***	0.2068	0.5072*	0.4135
Metro (UIC)	-0.802***	0.0517	0.132	0.048
Work out of county (%)	-0.0295***	-0.0016	-0.0187***	-0.016**
Work from home (%)	-0.0135	0.093**	0.0438	0.2543***
Other Retail LQ	0.3357***	0.3959***	-0.2284**	-0.2184
Sales tax rate	0.0307	0.0207	-0.0245	0.0632
Property tax rate	0.0764	0.2066	-0.2375	0.247
ISP count	0.0598	0.1064*	0.0672	0.1466
Social capital index	0.3705***	0.4115***	0.3911***	0.4551**
Opiate Fatalities (%)	-0.0445	-0.227***	0.0838	-0.1106
Community colleges			0.001	-0.0455
Universities/colleges	0.0485***	0.0356***	0.0232	0.1062***
Public land (%)			0.0021	0.011*
Hrs. to National Park	-0.1532**	-0.0291	-0.1598**	-0.287***
Water cover (%)	0.0069**	0.0145***	0.0201***	0.0325***
Inflation Stage
Ln(population)	-0.0673***		-0.0655	-0.1072***
Population Density	-196.3***		-22.07	-13.27
Metro adjacent (UIC)	0.0084		0.0918	0.0545
Metro (UIC)	0.1956***		0.0602	0.0579
Sales tax	-0.0071		-0.0364*	-0.0074
Property tax	-0.0213		-0.1033*	-0.0768*
Per capita income	-0.0926		0.0809	-0.0005
Median home value	-37.68***		-52.68***	-2653***
Significance levels: *<1%, <5%, *<10%

Appendix.docx

Download PDF

Reviewer #1 agreed at journal
07 Mar, 2023
Reviewers agreed at journal
29 Nov, 2022
Reviewers invited by journal
29 Nov, 2022
Editor assigned by journal
24 Sep, 2022
Submission checks completed at journal
23 Sep, 2022
Editor invited by journal
23 Sep, 2022
First submitted to journal
23 Sep, 2022

You are reading this latest preprint version

Revealing U.S. Retail Industries’ Functional Hierarchy Through Demand Thresholds

Status:

Version 1

Abstract

Figures

Introduction

Literature Review

Data And Methods

Data

Results

Distribution and industrial organization

Local and regional factors

Discussion

Summary And Concluding Remarks

Abbreviations

Declarations

References

Tables

Supplementary Files

Status:

Version 1