Same People, Different Results: Categorizing Cancer Registry Cases across the Rural-Urban Continuum


 Background: Many rural-urban indexes are utilized in United States cancer research. This variation introduces inconsistencies between studies with a rural-urban component. Recommendations to date on which index to utilize have prioritized index geographical unit over feasibility of index inclusion in analysis. We evaluated rural-urban indexes and recommend one index for use to increase comparability across studies.
Methods: We assessed nine U.S. rural-urban indexes regarding their respective rural and urban code ranges; geographical unit, land area, and population distributions; percent agreement; suitability as continuous variables in analysis; and feasibility of integration into national, state, and local cancer research. We referenced 1,569 Wisconsin Pancreatic Cancer Registry patients to demonstrate how rural-urban index choice impacts patient categorization.
Results: Six indexes categorized rural and urban areas. Indexes agreed on binary rural-urban designation for 88.8% of the U.S. population. As ternary variables, they agreed for 83.4%. For cancer registry patients, this decreased to 73.4% and 60.4% agreement, respectively. Rural-Urban Continuum Codes (RUCC) performed the best with ability to differentiate metropolitan, micropolitan, and rural counties, are available for retrospective and prospective studies, and can be coded continuously for analysis.
Conclusions: Whether a patient was categorized as urban or rural changed depending on which index was used when applied to a cancer registry data set. We conclude that RUCC is an appropriate and feasible rural-urban index to include in cancer research, as it is standardly available in national cancer registries in its 9-code format and can be matched to patient’s county of residence for local research and it had the least amount of fluctuation of the indices analyzed. Utilizing RUCC as a continuous variable across studies with a rural-urban component will increase reproducibility and comparability of results and eliminate the choice of rural-urban index as a potential source of discrepancy between studies. Trial registration: Not applicable

Urban Continuum Codes (RUCC) performed the best with ability to differentiate metropolitan, micropolitan, and rural counties, are available for retrospective and prospective studies, and can be coded continuously for analysis. Conclusions: Whether a patient was categorized as urban or rural changed depending on which index was used when applied to a cancer registry data set. We conclude that RUCC is an appropriate and feasible rural-urban index to include in cancer research, as it is standardly available in national cancer registries in its 9-code format and can be matched to patient's county of residence for local research and it had the least amount of uctuation of the indices analyzed. Utilizing RUCC as a continuous variable across studies with a rural-urban component will increase reproducibility and comparability of results and eliminate the choice of rural-urban index as a potential source of discrepancy between studies. Trial registration: Not applicable Background Research on cancer disparities increasingly incorporates community factors to understand variation across patient treatment and outcomes. Rurality, one such community factor, predicts later stages of cancer diagnosis, 1-3 lower rates of speci c therapies, 4-6 less effective therapies, 7 shorter overall survival, 6,8−11 and higher mortality rates. [12][13][14][15] These trends persist across geographical regions of the United States (U.S.) and cancer types. 6,7,15 However, there also exists important variation in treatment and outcomes between the group of patients categorized as rural and urban. 16 These variations result from differences in communities, 17 patient demographics, 18 and health care organizations. 19 Methodological differences in identifying rural communities and patients also produce variations and inconsistencies in measuring disparities in rural cancer patient treatment and outcomes. 16,20 More than 9 rural-urban indexes are used across cancer research to categorize patients. Indexes are based on differing graphical levels, including census tract, ZIP Code Tabulation Area (ZCTA), 21 and county. Indexes differ in terms of the unique combinations of criteria on which they are based, incorporating factors such as population size, percentage of commuting population, and adjacency to urban areas. Additionally there is confusion about terms. For example, urban and metropolitan are often used interchangeably, but these are based on approximate boundaries of 2013 ZIP Code Tabulation Areas (ZCTA). 21 Since these boundaries uctuate over time, we were unable to obtain the 2013 ZCTA population or land area data on which RUCA(z) was based. Therefore, we excluded this index from parts of our analysis.
1,569 patients from the University of Wisconsin-Health Pancreatic Cancer Registry (Registry Patients) diagnosed between 2004 to 2016 with pancreatic ductal adenocarcinoma served as a reference population to demonstrate how rural-urban index choice may impact patient categorization. We evaluated differences in rurality of the Registry patient cohort via percent agreement across county and ZCTA-based binary and ternary indexes. We compared the change in each index's median and interquartile range and mean over time for the patient cohort.

Comparing rural-urban indexes
In comparing indexes, we evaluated breadth -the extent to which rural and urban communities are differentiated from one another -and depth -the extent to which distinctions are made within rural or urbandesignated communities. Supporting Table 1 shows the 9 indexes by geographical unit, classi cation of urban and rural codes, and the amount and percentage of land area, geographical units, and population each index classi es as urban and rural (2010 versions) for the U.S., Midwestern States, and Wisconsin.
We excluded indexes that simply distinguished rural from urban communities; therefore, we excluded UACE and CBSA, as they only designate urban communities, and FAR, as it only designates rural communities (Supporting Table 1). We included the remaining 6 rural-urban indexes, RUCC, UIC, NCHS, IRR, RUCA, and RUCA(z), in the full analysis. We transformed these to binary indexes based on each index's binary categorization of metropolitan and non-metropolitan areas and to ternary indexes based on each index's ternary categorization of metropolitan, micropolitan/urban, and noncore/small town/rural. Because IRR is a continuous variable that does not subcategorize counties, we established divisions between metropolitan and non-metropolitan counties at IRR = 0.50 and further subdivided non-metropolitan counties into micropolitan/urban and rural counties at IRR = 0.60. 34 We calculated Cohen's kappa, with an ordinal weight, to evaluate the level of agreement across indexes in their binary and ternary forms by geographical units, land area, and population. We also compared the percent agreement of geographical units, land area, and population across county-and census-tract-based binary and ternary indexes (Table 1 and Supporting Table 2). We compared the distribution of geographical units, land area, and population across indexes via median and inter-quartile range and mean and standard deviation. We examined these trends visually via violin plots, with indexes standardized to illustrate transitions along a rural-urban interface, for the U.S., Wisconsin, and Registry patients (

Inconsistency and agreement across binary rural-urban designations
Supporting Table 1 displays the geographical unit, binary rural-urban delineation, and rural-urban categorization of Land Area, Geographical Units, and Total Population for each of the 9 indexes. There are 2 methods to designate communities as rural or urban in RUCA and RUCA(z); both methods are shown.
In addition to a large difference in distribution across rural and urban communities, there is also a wide distribution within rural areas across indexes as the percentage of rural communities (by geographical unit) in the U.S. ranged from 17.5% of ZCTAs (RUCA (option 2)) to 63.0% of counties (IRR) (Supporting Table 1).
By comparison, the difference in percentage of rural communities is even larger across the 12 Midwestern States (23.2-71.4%) and Wisconsin as a single state (13.2-63.9%). The percent of land area across rural communities ranged from 52% (FAR) to 97% (UACE). Recall that FAR and UACE indexes categorize either rural or urban areas, but not both. The variation in land area was smaller across indexes categorizing both rural and urban areas. The US population living in rural areas follows a similar pattern, with indexes categorizing rural or urban areas allocating 3.9% (FAR) to 19.3% (UA) of the US population to rural codes compared to indexes categorizing rural and urban areas allocating 11.5% (IRR) to 16.5% (RUCA (option 1)) of US population to rural codes. These trends for land area and total population were similar across the Midwest and Wisconsin.
Binary rural or urban designations agreed across RUCA, RUCC, UIC, NCHS, and IRR indexes for 88.8% of the US population (Table 1 and Supporting Table 2). RUCC and RUCA, the 2 most employed rural-urban indexes in cancer research, agreed on 94.9% of the US population. There was 73.4% agreement among Registry patients in classifying patients across binary RUCC, UIC, NCHS, IRR, and RUCA(z) indexes. This increased to 91.0% agreement among Registry patients when comparing RUCC and RUCA(z) only. We included RUCA(z) in this analysis as patient ZIP codes were known. The difference between the percent agreement at the national level compared to the local registry cohort was notable.
Cohen's Kappa ranged from 0.60 when comparing IRR with RUCC, UIC, and NCHS to 0.81 when comparing RUCA with RUCC, UIC, and NCHS, indicating moderate to very good agreement between indexes. We excluded RUCA(z) from this analysis as ZCTAs cannot be matched one to one with census tracts or counties.
Agreement decreased across ternary metropolitan, micropolitan, and rural designations RUCA, RUCC, UIC, NCHS, and IRR indexes agreed on ternary metropolitan, micropolitan, and rural designations for 83.4% of the US population (Table 1 and Supporting Table 2). These indexes simultaneously designate 6.0% of land area and 1.8% of US total population as rural, micropolitan, and metropolitan depending on the index used. Adding further confusion, while some indexes designate 5.3% of land area and 1.5% of total population as rural, other indexes designate these same areas and people as metropolitan. Again, there is higher agreement across designation in comparing RUCC and RUCA indexes; 88.8% of US population agreed when considering the RUCC and RUCA indexes only. Within the Registry patients, there was 60.4% agreement across these indexes when designating ternary metropolitan, micropolitan or rural communities. This increased to 74.9% agreement when limited to RUCC and RUCA(z). RUCA(z) was included in the Registry patient analysis since patient ZIP codes were known. Cohen's Kappa ranged across indexes from 0.53 for IRR compared to UIC and NCHS to 0.77 for RUCC and RUCA compared to UIC and NCHS, indicating moderate to good agreement as ternary indexes.
Differences in discrete or continuous index geographical units, land area, and population distributions ). The mean IRR value remained constant between its 2 available versions due to IRR being a relative measure, and therefore unable to capture absolute changes in rurality, which is necessary for longitudinal studies.

Discussion
Categorizing rural and urban communities Indexes must categorize both rural and urban areas to study cancer treatment and outcomes across the rural-urban continuum. UACE and CBSA indexes only categorize urban areas, and the FAR index only categorizes rural areas, making them unsuitable for research including a spectrum of rurality. RUCC, UIC, NCHS, IRR, RUCA, and RUCA(z) categorize areas across metropolitan, micropolitan, and rural areas.
Comparability of research based on different indexes RUCC, UIC, and NCHS are county-level indexes based on OMB metropolitan and non-metropolitan de nitions, 30-32,38 making them identical as binary variables and research based on them as binary variables comparable in terms of rurality (Supporting Table 1). UIC and NCHS further follow OMB guidelines to divide non-metropolitan counties into micropolitan and rural counties, making them identical as ternary variables, too. These 3 indexes employ different methodologies to subdivide counties within metropolitan, micropolitan, and rural categories, making them unique at individual code-levels. They also emphasize different subsets of counties; RUCC identi es 3 metropolitan levels of counties, 4 urban levels, and 2 rural levels 30 ; UIC prioritizes rural counties by designating 7 of 12 codes as rural 31 ; and NCHS prioritizes metropolitan counties by designating 4 of 6 codes as metropolitan. 32 RUCA and RUCA(z) also stem from OMB metropolitan and non-metropolitan categories. 27,29,38 They are subdivided into 2, 3, or 4 categories across 10 primary codes and further divided into 21 secondary codes (2010 index). Some researchers create a binary variable based on the primary codes (option 1) and other researchers group counties with a secondary code of x.1, indicating high commuting areas, with metropolitan counties to create a different binary variable (option 2). Therefore, research based on binary RUCA or RUCA(z) variables may not be directly comparable as researchers may use different methodologies to create binary rural-urban designations. This problem is exacerbated when researchers do not disclose which method they employed to create a binary RUCA variable in manuscripts. 39,40 The high population percent agreement between RUCC and RUCA at binary (94.9%) and ternary (88.8%) levels suggests that the index may introduce less variability into results than expected. However, the percent agreement between RUCC and RUCA(z) decreased to 91.0% at a binary and 74.9% at a ternary level when compared for the Registry Patients (Table 2). This may be due to this speci c patient population differing from national trends or be further evidence of RUCA(z) being a poor approximation of RUCA, as demonstrated by RUCA and RUCA(z) disagreeing for 28.9% of Wisconsin land area (Figure 3). Repeating this analysis on a cohort that includes patient-speci c census tract, ZIP code, and county is necessary to further explore this question. The differences in percent agreement between national and local populations highlight that national trends may not be replicated at a local health-system level.
Comparing indexes by geographical unit, land area, and population distributions Indexes varied in whether and to what extent they employed each of their individual codes to categorize geographical units, land area, and population. U.S. counties and land area were distributed across RUCC, though few counties, and therefore minimal land area and population, are categorized as RUCC 5 (Supporting Figures 1A and 1B). This creates a natural binary division within RUCC that does not follow the index's metropolitan/non-metropolitan or metropolitan/micropolitan/rural designations. Since UIC only designates 2 of 12 codes as urban, counties clustered within the urban group (Supporting Figure 1A). UIC cannot be interpreted across a continuum since micropolitan (codes 3, 5, and 8) and rural categories (codes 4, 6, 7, 9, 10, 11, 12) are not designated with sequential codes. 31 NCHS counties and land area clustered by its most rural code since it allocates only 1 category to rural counties (Supporting Figures 1A and 1C). IRR showed normal distributions across geographical units, land area, and population distributions, which is based on it being a relative measure of rurality ( Figure 1A and Supporting Figures 1A and 1C).
Census tract and population distributions were clustered in RUCA's most urban code, which is a product of census tracts being smaller and denser in more populated urban areas ( Figure 1A, Figure 2E, and Supporting Figure 1A). The opposite trend was seen in the RUCA land area distribution, with most land area clustering in its most rural code ( Figure 2E and Supporting Figure 1C). RUCA(z) separated to its more urban and most rural ZCTAs. Differences between the RUCA and RUCA(z) geographical unit distribution suggest that RUCA(z) may not adequately approximate RUCA (Figure 3 and Supporting Figure 1A). National trends were magni ed when viewed for Wisconsin, especially for population distributions ( Figure   1B). The population distribution was spread more evenly across RUCC and UIC metropolitan codes for Wisconsin than for the U.S. The population distribution was almost consistent across all NCHS codes, showed an urban cluster separating itself from the rest of Wisconsin for counties in and around Milwaukee County in IRR, and remained similar for RUCA in Wisconsin compared to the U.S. Within the Registry, patients were naturally divided into 2 patient populations by RUCC and UIC and into 4 patient populations by RUCA(z) ( Figure 1C). These differences between national, state, and local population distributions highlight that the rural-urban composition of research participants may differ drastically based a study's geographical reach.
Maps of Wisconsin by RUCC, UIC, NCHS, IRR, RUCA, and RUCA(z) highlight similarities and differences across county-, ZCTA-, and census-tract-level indexes (Figure 2). IRR and NCHS tended to homogenize rurality status. IRR designated most counties as micropolitan and used fewer than 50% of its values to categorize Wisconsin counties ( Figure 2D). Since IRR categorizes counties with a normal distribution, it draws a large distinction between the most urban and most rural counties and homogenizes counties that fall between those extremes. NCHS classi ed 32 of 72 counties into its 1 rural code, preventing researchers from distinguishing between groups of patients who live in different rural communities ( Figure 2C). UIC showed divergence in rurality, though recall its codes do not sequentially identify metropolitan, micropolitan, and rural counties, making it incorrect to interpret the UIC map along a continuum of rurality ( Figure 2B). RUCC, RUCA, and RUCA(z) showed divergences in rurality across their respective code ranges, giving weight to their utility in measuring rurality across a continuum (Figures 2A, 2E, and 2F).

Index suitability as a continuous variable
As researchers move away from binary rural-urban designations and towards studying rurality across the rural-urban continuum, indexes need to be conducive to continuous or multi-level ordinal coding for inclusion in analysis. Binary rural-urban designations may mask outcome variation within rural or urban groups. Continuous or multi-level ordinal variables may expose non-linear trends in cancer outcomes across the rural-urban continuum. 19,35 As indexes become more commonly employed as continuous variables, it becomes more important for researchers to use one index consistently across studies since index agreement decreases as the number of rurality groups used in analysis increases (Table 2). RUCC, NCHS, IRR, RUCA (option 1), and RUCA(z) (option 1) are ordinal indexes that may be coded as continuous variables in analysis. UIC does not divide its non-metropolitan codes into micropolitan versus rural codes sequentially, 31 preventing it from being used as a continuous variable. NCHS only designates 1 code for micropolitan counties and 1 code for rural/non-core counties, restricting researcher's ability to distinguish between levels of rurality within subgroups of micropolitan or rural patients. 32 IRR, as a relative index, designates counties along a normal distribution, effectively homogenizing rurality status such that it is di cult to distinguish between counties of different rurality levels on a regional or local scale. For example, IRR uses fewer than half of its values to categorize Wisconsin's 72 counties ( Figure 2D). RUCA (option 2) and RUCA(z) (option 2), which include the x.1 secondary code as metropolitan, introduce ambiguity as to the most appropriate way to order codes as continuous variables. If RUCA and RUCA(z) are used as continuous variables, it should be based on primary RUCA codes only. RUCC includes multiple codes for metropolitan, micropolitan, and rural designations that are ordered sequentially, making it an unambiguous index conducive to use as a continuous variable in analysis.

Index feasibility to be used in cancer research
The National Cancer Database (NCDB), North American Association of Central Cancer Registries (NAACCR), and Surveillance, Epidemiology, and End Results Program (SEER) registries include RUCC and RUCA indexes.
RUCC is included in its original 9-code form, and RUCA is recoded into a binary rural-urban variable. Registry inclusion makes RUCC and RUCA accessible to researchers, though recoding RUCA into a binary variable prevents researchers from studying the rural-urban continuum or variation within rural or urban subgroups.
RUCA is reduced to a binary variable to protect con dentiality and prevent case identi cation via the combination of census tract and county level data. Therefore, RUCC is the most accessible and speci c index available for registry-based cancer research.
At a health-system and local level, a patient's county or ZIP code is more readily available than their census tract. 36 Counties and ZIP codes are standard elds in electronic health records (EHR) and health system registries. This difference in availability means that generally researchers use county or ZIP-code based indexes (RUCC, UIC, NCHS, IRR, RUCA(z)) in EHR or local registry cancer research. However, since ZIP codes change frequently and RUCA(z) versions are only available for non-census years (1998,2004,2006,2013), researchers risk excluding cases from their analysis if a patient's ZIP code does not have a match in the chosen RUCA(z) le. With this and additional limitations to RUCA(z) outlined below, it is preferable to avoid ZIP code and ZCTA-based indexes. 36,41,42 Therefore, county-based indexes, namely RUCC for the reasons listed above, are preferred for health-system and local level research.

Indexes over time
The choice of which index year to employ should be based on the role rurality is hypothesized to play in one's study. Rurality as an exposure may be calculated on a past version of an index, whereas rurality as an enabler or barrier to care should be calculated from a current version, relative to the year(s) of study. When rurality is investigated as an exposure, patients may be misclassi ed as they move. This may obstruct the rurality designation of interest.
Since IRR is relative to other counties, absolute changes in rurality over time are masked, making this index inappropriate for longitudinal studies.
Considerations for RUCA(z) and ZCTAs RUCA(z) approximates RUCA and is not calculated directly from ZCTA-level characteristics. ZCTAs approximate ZIP codes, and it is possible for a patient's ZIP code to differ from their ZCTA. 21 ZIP Codes are subject to change, as evidenced by the regular ZIP Code updates released by the U.S. Postal Service, 43 so a patient's ZIP code at diagnosis may not match their ZIP code for the year of study, irrespective of whether they have moved. These approximations and ongoing administrative changes may introduce inaccuracies into RUCA(z) and expose multiple opportunities for patient misclassi cation. 17,41 The difference between RUCA and RUCA(z) geographical unit distributions across the U.S. and Wisconsin highlight that RUCA(z) may not adequately approximate the census-tract based RUCA (Figure 3 and Supporting Figure 1A and 1B).
The RUCA(z) map shows irregular ZCTA boundaries, supporting the advice that researchers should be wary of using ZCTAs as a geographical unit ( Figure 2F). 41,42 The extent of misclassi cation could be further studied if land area and population data is made available for ZCTA 2013 or for the year of the next RUCA(z) release that can be compared to census-tract level data. Furthermore, as opposed to RUCC, UIC, NCHS, and RUCA, RUCA(z) is not published by a government agency, which makes its ongoing availability less assured.

Limitations
We evaluated rural-urban indexes for their ability to categorize cancer patients across the rural-urban continuum, geographical unit, land area, and population distributions, and percent agreement. We did not have access to a cancer patient data set with patient-speci c ZIP codes, census tracts, and counties, though, and were unable to obtain the percent agreement at a data set-level across indexes that utilize these 3 geographic units. County, ZCTA, and Census Tract land area varies by state, and we did not evaluate land area distributions on a per-state level. This consideration is especially important for states with fewer and larger counties.

Conclusions
Utilizing the Rural-Urban Continuum Code (RUCC) index across cancer research that includes a rural-urban component will increase reproducibility and comparability of results and eliminate index choice as a source of discrepancy across studies. Counties are a stable geographic unit of analysis and are readily available at the patient level within local, regional, and national research settings and within electronic health record and registry data sources. RUCC includes a spectrum of codes across metropolitan, micropolitan, and rural communities. If necessary, it can be grouped into a binary or ternary variable. RUCC indexes for 1993, 2003, and 2013 are available in several national registries at a discrete level, enabling researchers to study ruralurban residence across a continuum rather than as a binary factor impacting patients' treatments and outcomes. ZCTA-based indexes should be avoided as ZCTAs approximate actual ZIP code boundaries, change frequently, and represent administrative rather than geographical areas 42 . Government agencies should work towards a census-block level measure of rurality that is accessible to researchers without compromising patient con dentiality. The census block provides the most speci c unit of geographical analysis and therefore minimizes the risk of masking disparities within larger geographical units. Finally, just as a patient is more than their age or ethnicity, a patient is more than their rural residence. Researchers should continue to include social, economic, and health-related variables alongside rurality in cancer prevention and outcomes research to understand the many factors impacting disparities in cancer treatment and outcomes and how these factors interact differently across geographical and care settings.

Declarations
Ethics Approval: The pancreas cancer registry data use was exempt as human subjects research by the University of Wisconsin Health Sciences IRB, ID # 2019-0155, expiration 4/26/2024. Availability of data and materials: The datasets generated and/or analyzed during the current study are not publicly available due to HIPAA restrictions with personal health information for the registry patients, but are available from the corresponding author on reasonable request. The other datasets analyzed are publicly available and are referenced as such in the manuscript.