Death notice data
Raw data consisted of the 49,628 deaths that occurred in the state of Geneva between 1908 and 2017, with information on the deceased including name, birthday, civil status, nationality (Swiss/non-Swiss), date of death, and address. Data were collected every month through web scraping of the mortuary announcements, which were publicly available through the official gazette (FAO) until 2017 (Republic and Canton of Geneva, https://fao.ge.ch). We only retained for analysis the deaths that have occurred since 2003 (49,077; 98.9%), the year from which we have information consistent with the number of deaths reported by Statistique Genève (www.ge.ch/statistique). Inaccurate data (e.g., duplicate observations, unknown death dates, unidentified addresses, or locations from other cantons or countries) and individuals with missing nationality were removed from the analysis (16,759; 33.7% and 39; 0.07%, respectively), as well as 1,681 addresses (3.39%) that could not be georeferenced.
Due to the lack of specific gender information in the database, we used the genderize.io API for name-to-gender inference as it shows a correct performance rate compared to other web services [28]. This approach is commonly used in gender inequality research, such as investigating women’s representation in academic literature [29,30]. The API returns the gender that is most commonly associated with a given first name, along with confidence parameters. With this process, we were able to recover the gender for 93.6% of our dataset.
Finally, 30,592 observations (61.6% of the individuals in the original dataset) were retained for further analysis.
Mortality Indicators
Life Expectancy at Birth (LEB) was defined as the average life length of a group of individuals born in a given year, considering the evolution in mortality conditions through their lifetime [31]. The statistic was extracted from the Swiss Federal Statistical Office longitudinal life tables (FSO, https://www.bfs.admin.ch/) for the 1900-2013 period [32] and was attributed to the deceased based on the year of birth and gender.
Life Expectancy Difference (LED) was defined as the difference in years between the age at death and the individual LEB.
Neighborhood’s income level
To assess the influence of the socioeconomic status on LED, we used the median annual neighborhood household income at the statistical subsectors level (n=475), elaborated every year from 2005 to 2016 by Statistique Genève (https://www.ge.ch/statistique/). The calculation of the annual neighborhood income excluded unmarried individuals (i.e., single, divorced, widowed) as their taxable income may not reflect their quality of life. We assigned a neighborhood income value to each individual based on the registered residential address at the time of the death. Since no tax data were available for the years 2003, 2004, and 2017, we assigned the median income of the nearest year.
Statistical Analysis
We investigated the spatial structure of LED across the state of Geneva using the Local Moran statistic [33]. The statistic relies on a measure of spatial dependence (or spatial autocorrelation), i.e., how similar observations tend to be within a specific neighborhood (spatial lag) and identifies local clusters of low and high LED values.
We decided to analyze the LED variable within a 1,200 meter-buffer (spatial lag) around each individual’s residential address. This methodological choice was supported by similar epidemiological studies conducted in the state of Geneva [34,35].
For each residential address, the correlation between the observed variable and the mean of this variable in a given neighborhood (spatial lag) was calculated. The standardized scatterplot of this relationship allows to identify four distinct types of spatial association: (1) High-High clusters (dark green dots in the maps) represent individuals with high LED values (i.e., that live longer than expected) surrounded by individuals with high LED values; (2) Low-Low clusters (dark purple dots in the maps) represent individuals with low LED values (i.e., that live shorter than expected) surrounded by individuals with low LED values, (3) Low-High spatial outliers (light purple dots in the maps) represent individuals with high LEDs surrounded by individuals with low LEDs, and (4) High-Low spatial outliers (light green dots in the maps) represent individuals with low LEDs surrounded by individuals with high LEDs.
To assess whether or not the null hypothesis of no spatial association can be rejected, we performed a significance test using 99,999 Monte-Carlo permutations where for each step, the value yi at a specific location i is held fixed and the location of its neighboring values are randomly permuted [33]. Pseudo p-values were then calculated as the probability of obtaining a local Moran’s I larger than observed [36]. To consider the effects of simultaneous multiple comparisons [37], we applied a Bonferroni correction for an overall alpha level of 0.1, resulting in an individual significance level of 1e-5. Non-significant locations (i.e., that have a pseudo p-value > 1e-5) are shown in white on the maps.
To control for nationality and neighborhood income, we performed the same analysis on the adjusted LED values obtained with a median regression. Gender was not used as a dependent variable because this factor was already considered in the definition of Life Expectancy at Birth.
Methodological and computational issues may arise from conducting spatial statistics on such large datasets. Therefore, we replicated the analysis (both for the raw and adjusted LED models) on ten random subsets, each containing 10,000 observations drawn from the 30,592 deceased. With this method, we could perform enough permutations to apply a Bonferroni correction while ensuring the robustness of the discovered spatial structure. Description of samples, characteristics of the spatial weights, and regression results for the ten subsets can be found in the Supplementary Tables S1, S2, and S3. Since the spatial structure of LED was similar across subsets, only the maps of subset 5 are shown in the paper for descriptive purposes. However, the results for the other subsets can be found in the Supplementary Figures S1 and S2.
For both the raw and adjusted LED models, we summarized the results of the ten random subsets by calculating range, mean, and standard deviation of population characteristics within each cluster type (i.e., Not significant, High-High, Low-Low, High-Low, Low-High). These population characteristics include number of individuals within each cluster type, gender, nationality, median neighborhood income, LED value, and individual YPLL. The YPLL was calculated using a 75 year cut-off [38,39]. We used the Tukey’s HSD test to compare all the possible pairs of means between each of the Local Moran cluster types to identify significant differences in population characteristics.
Spatial analyses were carried out in R using the rgeoda package [40].