Data and study area
The data for PCa was from the Saskatchewan Cancer Registry (SCR) and consisted of demographic, clinical, and geographic information for 3,526 patients diagnosed with PCa between 2010 and 2014. Based on the demographic information, all PCa patients were age 35 years or over. The study area contained 82 geographic areas (GAs) in central and southern Saskatchewan categorized (for privacy reasons) by SCR using residence codes (Figures 1 and 2) (23). From 2010 to 2014, the study area contained 3,289 PCa patients, after excluding those living out-of-province at the time of diagnosis (194 patients) and those (43 patients) living in the three northern regions (Mamawetan Churchill River, Keewatin Yatthe, and Athabasca). The northern regions were excluded because these regions could not be subdivided due to privacy reasons. Of these 3,289 PCa patients, the analysis further excluded 298 patients because their PCa risk levels were unknown. Therefore, the final sample had 2,991 patients, each categorized per the GA in which the patient lived at the time of diagnosis.
To calculate SIRs (described in the following sections), population counts for 2012 from the Saskatchewan Covered Population (SCP) were used in the denominator in the formula to calculate PCa SIR (see Definitions section) (23). The SCP is a count of residents with provincial health insurance in Saskatchewan in a given year and is maintained by Government of Saskatchewan (23). Because all PCa patients in the SCR dataset were age 35 or over, the SCP data used was for men over the age of 35. The overall SCP for men age 35 or over deviated less than 3% each year (24), therefore we chose to use statistics from the midpoint year 2012 (between 2010 and 2014) for the denominators in the calculations (25).
To calculate physician density (described in the following sections) for the period 2010 to 2014, the required data from the Canadian Medical Association was only available for 2011 (26). Hence our estimated physician densities are based on the year 2011. Canadian Medical Association data consists of family physicians and general practitioners licensed to practice medicine in Saskatchewan.
The University of Saskatchewan BioMedical Research Ethics Board provided ethics approval (Bio-REB certificate #15-34).
The risk levels (low, intermediate, high) for PCa were based on the Genitourinary Radiation Oncologists of Canada (GUROC) definitions (27) and a fourth risk level (“metastatic”) was added to include patients diagnosed with metastatic cancer. For each risk level, the expected number of PCa cases in the ith GA (Ei) was calculated as follows (28):
where ni and Oi respectively denote the population count of men age 35 or over and the observed number of PCa cases in the ith GA. For each PCa risk level, the standardized incidence ratio (SIR) (which will be referred to as the crude estimated SIR) was estimated by dividing the number of observed cases in by the number of expected cases in each GA (28).
The study variables of interest were physician density, remoteness index, and closest PCa assessment centre. For each GA, the family physician density was calculated using the 2011 Canadian Medical Association data and the same population denominator used for the expected count of PCa cases. A remoteness index for a GA was calculated using the average of the Statistics Canada remote indices for regions forming the GA . For each GA, the closest PCa assessment centre was categorized as Regina or Saskatoon, based on the shortest Euclidean distance between the centroid of the GA and the centroids of Saskatoon and Regina. Further details regarding remoteness index and closest PCa assessment centre variables used in this study can be found in the literature (3, 4).
Clustering analysis was conducted to identify spatial clusters of PCa SIR by each risk level. Second, for each PCa risk level, a null model was built where the crude estimated SIRs were smoothed using the method proposed by Besag, York and Mollie (BYM model) (29). The estimated SIRs from the BYM models will be referred to as the smoothed estimated SIRs. Third, ecological analyses were conducted to assess associations between the independent variables and the smoothed estimated SIRs for each of four PCa risk levels.
For each risk level, Global Moran’s I was calculated using the crude estimated SIR value for each GA (30). The statistical significance for Global Moran’s I statistic was calculated using 999 permutations (30), which, if significant, demonstrates that the GAs sharing common boundaries have similar SIRs instead of having random geographically-distributed SIRs (31). GAs within a 120-km radius of a GA were identified as neighbours of the GA. The corresponding weight matrix for the analysis was then computed using the inverse of the Euclidean distances between the centroids of a GA and its neighbours. This weight matrix was chosen to reflect the suspected correlation structure of the data (32).
For each risk level with statistically significant Global Moran’s I values, the crude estimated SIRs were studied further using the Local Moran’s I and Kuldorff’s Spatial Scan statistics (33-35).
The SIRs were estimated using a Bayesian model-based approach to ensure, if spatial correlation exists, the estimated SIRs (i.e the smoothed estimated SIRs) were corrected for any spatial dependence between the GAs.
First, for each PCa risk level, a null model was built where the smoothed estimated SIRs were computed using the Bayesian BYM method (29). Due to the count nature of the data, we assume our observed data Oi follows a Poisson distribution (36) with mean Eiqi where Ei and qi respectively denote the expected number of PCa cases and the “true” SIR in the ith GA (37, 38). The BYM method models the log of the SIR as follows:
Log(qi) = c + ui + vi
where intercept c is the mean, and the terms ui and vi respectively denote the spatially structured and unstructured random effects (37, 38).
The parameters used in this model are based on literature (37-41). The random effects and the intercept are assigned prior distributions. The intercept was assigned a uniform prior that extends over the whole real line (37, 38). The structured random effect ui was assumed to follow a conditional auto-regressive distribution and the unstructured random effect vi was assumed to follow a normal distribution with mean zero (37, 38). The variability for both random effects were controlled by a precision parameter. The precision parameter for the random effects were assigned a Gamma distribution with hyper-prior specification of (0.5, 0.0005) (39, 41).
The simulation for each model consisted of three chains (42, 43). Each chain consisted of 200,000,000 iterations to obtain 50,000 data points: one for each 4000 time steps taken. A burn-in period of 8,000,000 iterations was selected based on the characteristics of the Brooks-Gelman-Rubin plots (38, 42, 43). To determine whether the generated estimates for each parameter were from the correct distribution, the following diagnostic tests were performed: potential scale reduction factor, (42) stationarity and half-width tests, (44) Z-score for equality of the means, (45) and run length control (46, 47)).
Using the BYM models, unconditional analyses were conducted to identify any associations between the independent variables and the SIRs for each risk level. The statistical significance of an independent variable was determined via its 95% credible interval (CrI).
Global and Local Moran’s I statistics were computed using Geoda 1.12 (48). Kuldorff’s Spatial Scan Statistic was computed using SatScanTM v9.4 (49). SIR maps were built using quantum Geographical Analysis System (QGIS.org) Version 3.12 (50). BYM models were built in OpenBUGS version 3.2.3 (51). Convergence diagnostics for the BYM models were conducted in R using the package ‘coda’ (52).