Categorizing rural and urban communities
Indexes must categorize both rural and urban areas to study cancer treatment and outcomes across the rural-urban continuum. UACE and CBSA indexes only categorize urban areas, and the FAR index only categorizes rural areas, making them unsuitable for research including a spectrum of rurality. RUCC, UIC, NCHS, IRR, RUCA, and RUCA(z) categorize areas across metropolitan, micropolitan, and rural areas.
Comparability of research based on different indexes
RUCC, UIC, and NCHS are county-level indexes based on OMB metropolitan and non-metropolitan definitions,30-32,38 making them identical as binary variables and research based on them as binary variables comparable in terms of rurality (Supporting Table 1). UIC and NCHS further follow OMB guidelines to divide non-metropolitan counties into micropolitan and rural counties, making them identical as ternary variables, too. These 3 indexes employ different methodologies to subdivide counties within metropolitan, micropolitan, and rural categories, making them unique at individual code-levels. They also emphasize different subsets of counties; RUCC identifies 3 metropolitan levels of counties, 4 urban levels, and 2 rural levels30; UIC prioritizes rural counties by designating 7 of 12 codes as rural31; and NCHS prioritizes metropolitan counties by designating 4 of 6 codes as metropolitan.32
RUCA and RUCA(z) also stem from OMB metropolitan and non-metropolitan categories.27,29,38 They are subdivided into 2, 3, or 4 categories across 10 primary codes and further divided into 21 secondary codes (2010 index). Some researchers create a binary variable based on the primary codes (option 1) and other researchers group counties with a secondary code of x.1, indicating high commuting areas, with metropolitan counties to create a different binary variable (option 2). Therefore, research based on binary RUCA or RUCA(z) variables may not be directly comparable as researchers may use different methodologies to create binary rural-urban designations. This problem is exacerbated when researchers do not disclose which method they employed to create a binary RUCA variable in manuscripts.39,40
The high population percent agreement between RUCC and RUCA at binary (94.9%) and ternary (88.8%) levels suggests that the index may introduce less variability into results than expected. However, the percent agreement between RUCC and RUCA(z) decreased to 91.0% at a binary and 74.9% at a ternary level when compared for the Registry Patients (Table 2). This may be due to this specific patient population differing from national trends or be further evidence of RUCA(z) being a poor approximation of RUCA, as demonstrated by RUCA and RUCA(z) disagreeing for 28.9% of Wisconsin land area (Figure 3). Repeating this analysis on a cohort that includes patient-specific census tract, ZIP code, and county is necessary to further explore this question. The differences in percent agreement between national and local populations highlight that national trends may not be replicated at a local health-system level.
Comparing indexes by geographical unit, land area, and population distributions
Indexes varied in whether and to what extent they employed each of their individual codes to categorize geographical units, land area, and population. U.S. counties and land area were distributed across RUCC, though few counties, and therefore minimal land area and population, are categorized as RUCC 5 (Supporting Figures 1A and 1B). This creates a natural binary division within RUCC that does not follow the index’s metropolitan/non-metropolitan or metropolitan/micropolitan/rural designations. Since UIC only designates 2 of 12 codes as urban, counties clustered within the urban group (Supporting Figure 1A). UIC cannot be interpreted across a continuum since micropolitan (codes 3, 5, and 8) and rural categories (codes 4, 6, 7, 9, 10, 11, 12) are not designated with sequential codes.31 NCHS counties and land area clustered by its most rural code since it allocates only 1 category to rural counties (Supporting Figures 1A and 1C). IRR showed normal distributions across geographical units, land area, and population distributions, which is based on it being a relative measure of rurality (Figure 1A and Supporting Figures 1A and 1C).
Census tract and population distributions were clustered in RUCA’s most urban code, which is a product of census tracts being smaller and denser in more populated urban areas (Figure 1A, Figure 2E, and Supporting Figure 1A). The opposite trend was seen in the RUCA land area distribution, with most land area clustering in its most rural code (Figure 2E and Supporting Figure 1C). RUCA(z) separated to its more urban and most rural ZCTAs. Differences between the RUCA and RUCA(z) geographical unit distribution suggest that RUCA(z) may not adequately approximate RUCA (Figure 3 and Supporting Figure 1A).
National trends were magnified when viewed for Wisconsin, especially for population distributions (Figure 1B). The population distribution was spread more evenly across RUCC and UIC metropolitan codes for Wisconsin than for the U.S. The population distribution was almost consistent across all NCHS codes, showed an urban cluster separating itself from the rest of Wisconsin for counties in and around Milwaukee County in IRR, and remained similar for RUCA in Wisconsin compared to the U.S. Within the Registry, patients were naturally divided into 2 patient populations by RUCC and UIC and into 4 patient populations by RUCA(z) (Figure 1C). These differences between national, state, and local population distributions highlight that the rural-urban composition of research participants may differ drastically based a study’s geographical reach.
Maps of Wisconsin by RUCC, UIC, NCHS, IRR, RUCA, and RUCA(z) highlight similarities and differences across county-, ZCTA-, and census-tract-level indexes (Figure 2). IRR and NCHS tended to homogenize rurality status. IRR designated most counties as micropolitan and used fewer than 50% of its values to categorize Wisconsin counties (Figure 2D). Since IRR categorizes counties with a normal distribution, it draws a large distinction between the most urban and most rural counties and homogenizes counties that fall between those extremes. NCHS classified 32 of 72 counties into its 1 rural code, preventing researchers from distinguishing between groups of patients who live in different rural communities (Figure 2C). UIC showed divergence in rurality, though recall its codes do not sequentially identify metropolitan, micropolitan, and rural counties, making it incorrect to interpret the UIC map along a continuum of rurality (Figure 2B). RUCC, RUCA, and RUCA(z) showed divergences in rurality across their respective code ranges, giving weight to their utility in measuring rurality across a continuum (Figures 2A, 2E, and 2F).
Index suitability as a continuous variable
As researchers move away from binary rural-urban designations and towards studying rurality across the rural-urban continuum, indexes need to be conducive to continuous or multi-level ordinal coding for inclusion in analysis. Binary rural-urban designations may mask outcome variation within rural or urban groups. Continuous or multi-level ordinal variables may expose non-linear trends in cancer outcomes across the rural-urban continuum.19,35 As indexes become more commonly employed as continuous variables, it becomes more important for researchers to use one index consistently across studies since index agreement decreases as the number of rurality groups used in analysis increases (Table 2).
RUCC, NCHS, IRR, RUCA (option 1), and RUCA(z) (option 1) are ordinal indexes that may be coded as continuous variables in analysis. UIC does not divide its non-metropolitan codes into micropolitan versus rural codes sequentially,31 preventing it from being used as a continuous variable. NCHS only designates 1 code for micropolitan counties and 1 code for rural/non-core counties, restricting researcher’s ability to distinguish between levels of rurality within subgroups of micropolitan or rural patients.32 IRR, as a relative index, designates counties along a normal distribution, effectively homogenizing rurality status such that it is difficult to distinguish between counties of different rurality levels on a regional or local scale. For example, IRR uses fewer than half of its values to categorize Wisconsin’s 72 counties (Figure 2D).
RUCA (option 2) and RUCA(z) (option 2), which include the x.1 secondary code as metropolitan, introduce ambiguity as to the most appropriate way to order codes as continuous variables. If RUCA and RUCA(z) are used as continuous variables, it should be based on primary RUCA codes only. RUCC includes multiple codes for metropolitan, micropolitan, and rural designations that are ordered sequentially, making it an unambiguous index conducive to use as a continuous variable in analysis.
Index feasibility to be used in cancer research
The National Cancer Database (NCDB), North American Association of Central Cancer Registries (NAACCR), and Surveillance, Epidemiology, and End Results Program (SEER) registries include RUCC and RUCA indexes. RUCC is included in its original 9-code form, and RUCA is recoded into a binary rural-urban variable. Registry inclusion makes RUCC and RUCA accessible to researchers, though recoding RUCA into a binary variable prevents researchers from studying the rural-urban continuum or variation within rural or urban subgroups. RUCA is reduced to a binary variable to protect confidentiality and prevent case identification via the combination of census tract and county level data. Therefore, RUCC is the most accessible and specific index available for registry-based cancer research.
At a health-system and local level, a patient’s county or ZIP code is more readily available than their census tract.36 Counties and ZIP codes are standard fields in electronic health records (EHR) and health system registries. This difference in availability means that generally researchers use county or ZIP-code based indexes (RUCC, UIC, NCHS, IRR, RUCA(z)) in EHR or local registry cancer research. However, since ZIP codes change frequently and RUCA(z) versions are only available for non-census years (1998, 2004, 2006, 2013), researchers risk excluding cases from their analysis if a patient’s ZIP code does not have a match in the chosen RUCA(z) file. With this and additional limitations to RUCA(z) outlined below, it is preferable to avoid ZIP code and ZCTA-based indexes.36,41,42 Therefore, county-based indexes, namely RUCC for the reasons listed above, are preferred for health-system and local level research.
Indexes over time
The choice of which index year to employ should be based on the role rurality is hypothesized to play in one’s study. Rurality as an exposure may be calculated on a past version of an index, whereas rurality as an enabler or barrier to care should be calculated from a current version, relative to the year(s) of study. When rurality is investigated as an exposure, patients may be misclassified as they move. This may obstruct the rurality designation of interest.
Since IRR is relative to other counties, absolute changes in rurality over time are masked, making this index inappropriate for longitudinal studies.
Considerations for RUCA(z) and ZCTAs
RUCA(z) approximates RUCA and is not calculated directly from ZCTA-level characteristics. ZCTAs approximate ZIP codes, and it is possible for a patient’s ZIP code to differ from their ZCTA.21 ZIP Codes are subject to change, as evidenced by the regular ZIP Code updates released by the U.S. Postal Service,43 so a patient’s ZIP code at diagnosis may not match their ZIP code for the year of study, irrespective of whether they have moved. These approximations and ongoing administrative changes may introduce inaccuracies into RUCA(z) and expose multiple opportunities for patient misclassification.17,41 The difference between RUCA and RUCA(z) geographical unit distributions across the U.S. and Wisconsin highlight that RUCA(z) may not adequately approximate the census-tract based RUCA (Figure 3 and Supporting Figure 1A and 1B). The RUCA(z) map shows irregular ZCTA boundaries, supporting the advice that researchers should be wary of using ZCTAs as a geographical unit (Figure 2F).41,42 The extent of misclassification could be further studied if land area and population data is made available for ZCTA 2013 or for the year of the next RUCA(z) release that can be compared to census-tract level data. Furthermore, as opposed to RUCC, UIC, NCHS, and RUCA, RUCA(z) is not published by a government agency, which makes its ongoing availability less assured.
We evaluated rural-urban indexes for their ability to categorize cancer patients across the rural-urban continuum, geographical unit, land area, and population distributions, and percent agreement. We did not have access to a cancer patient data set with patient-specific ZIP codes, census tracts, and counties, though, and were unable to obtain the percent agreement at a data set-level across indexes that utilize these 3 geographic units. County, ZCTA, and Census Tract land area varies by state, and we did not evaluate land area distributions on a per-state level. This consideration is especially important for states with fewer and larger counties.