Centenarian health data was retrieved from the 1895, 1905 and 1910 Danish Birth Cohort Studies. These are national population-based surveys with no exclusion criteria. All individuals born in 1895, 1905 and 1910 in Denmark were contacted to be interviewed and physically and cognitively tested during the year they would have turned 100 years. The 1895 cohort comprised of 207 out of 276 (75%) invited to participate and was examined by a geriatrician and a nurse. The assessments of the 1905 and 1910 cohorts were conducted by a specialized survey agency and comprised of 256 out of 439 (59%) and 273 out of 428 (63%) invited participants respectively. If someone was unable to participate because of their health status, a proxy respondent was invited to participate in the interview. The questionnaire used in the assessment of health characteristics of centenarians in the 1895 cohort is different from the one used for the other two cohorts. Detailed information about the surveys is available in Rasmussen et al. (34).
We use four indicators to capture different health dimensions: physical ability, activities of daily living, cognitive status, and self-rated health. The selection of the indicators was based on previous studies showing that these characteristics are related to the survival of nonagenarians27. The Chair Stand test was used to assess physical ability: individuals who can stand up from a chair without the use of arms are in better physical health than those who need to use hands or those who cannot (37). Functional status was assessed by five questions regarding the ability to perform activities of daily living: bathing, dressing, toileting, ability to walk and feeding. Individuals were divided into not disabled, moderately disabled and disabled according to the Katz’ disability score calculated on their answers (38). The cognitive status of centenarians was evaluated using the Mini-Mental State Examination (MMSE). The higher the MMSE score, the better the cognitive status (0-30). We divided it into three categories: 24-30 indicates no cognitive impairment, 18-23 mild cognitive impairment and 0-17 severe cognitive impairment (39). Self-rated health was assessed with the question: “How do you consider your health in general?”. It was divided in three categories: “excellent or good”, “acceptable” and “poor or very poor” (40).
These four indicators of health had missing values. To handle them without introducing bias into our results, we performed data imputation by taking advantage of other information in the survey that was not included in the analysis. We created a “non tested” category for Chair Stand, MMSE and Self-Rated health. For the Chair Stand score, those individuals with missing values who could not perform the physical performance ADL Strength test were the ones included in the “non tested” category. For MMSE and Self-Rated health, we categorized those individuals that reported missing values, but with the answers provided by a proxy respondent, as “non tested”. The rationale being that these tests cannot be performed by proxy respondents. For the Katz’s disability score we did not create a “non tested” category. However, this score reported very few missing values (2 individuals in each cohort). The creation of the “non tested” category allowed us to considerably reduce the number of missing values for participants who were unable to respond due to ill health (41). However, there were still some missing values in the dataset. Thus, we remove individuals who have missing values in at least one of the variables in the analysis.
The date of death of each centenarian in Denmark (participants and non-participants) was retrieved from the Danish Civil Registration System. Some survey participants died before turning age 100 (e.g. ages 99.7, 99.5, etc.). We excluded these individuals from the main analysis to avoid immortal time bias in the calculation of survival probabilities (42).
After removing individuals with missing values in at least one of the variables in the analysis and those that did not survive to age 100, we analyse 170 individuals in the 1895 Cohort; 195 individuals in the 1905 Cohort and 223 in the 1910 Cohort. Tables A4 and A5 of the Supplemental Material show the characteristics of individuals included in the analysis. To test if our data is representative of the entire population, we use the log-rank test to compare survival trajectories of participants included in the analysis against those that did not participated in the survey. Survival trajectories of both groups (participants included in the analysis and non-participants) for the 1905 and 1910 cohorts are similar, which indicates that data used in our analysis is representative of national population of Danish centenarians for those cohorts. For the 1895 cohort, survival trajectories of individuals included in the analysis are statistically different from the survival trajectories of non-participants. This indicates a possible health selection in the 1895 cohort. In spite of this, we still analyse data of the cohort 1895 to determine if their health characteristics differ from the health characteristics of the 1905 and 1910 cohorts.
Finally, we also conducted a sensitivity analysis to test the effect of removing those participants that did not reach age 100 in each cohort (1895, 1905 and 1910). All sensitivity analyses and robustness checks are included in the Supplemental Material.
We perform a Latent Class Analysis (LCA) to shed light on the unobserved heterogeneity in health among Danish centenarians. LCA is a statistical method used to identify unobserved classes of individuals via observed categorical variables (43, 49-53). By considering several individual characteristics, the LCA determines individual probabilities of belonging to the latent classes and probabilities of finding a person with a certain characteristic in each class. More details about the LCA can be found in the Supplemental Material. Individuals in each class share similar characteristics and at the same time, they are different from individuals in other classes. Our aim is to identify health classes to further contrast the survivorship of individuals belonging to each of them. We consider different dimensions of health in the LCA: physical health (Chair Stand test), functional status (Katz’s disability Index), cognitive impairment (MMSE) and emotional wellbeing (Self-Rated Health). It is known that there are sex differences in health and survival among centenarians (44). For this reason, we included sex as a covariate that allows us to place individuals into classes (35). We could not stratify the analysis by sex because of the small number of males in the study population.
To test the robustness of our results, we performed three different sensitivity analyses. First, we included Smoking in the LCA in addition to the four health indicators mentioned above. While it has been shown that smoking is not related to survival at the highest ages (26), we performed this additional analysis to determine how the inclusion of an unrelated health indicator affects our results. Second, given that most centenarians are women, we performed an extra analysis by only considering females in the LCA. Finally, we performed a LCA by including all individuals that died before age 100.
We performed LCA for each cohort. Since individuals in the 1895 cohort are not directly comparable to the ones in 1905 and 1910 due to differences in the questionnaire used and their survival trajectories differ from the non-participants (see details in Data section), we present the analysis of the 1895 cohort in the Supplemental Material and focus here on the 1905 and 1910 cohorts. For each cohort, various LCAs were performed by changing the number of classes in each iteration, from two to six. We considered six health classes to be the maximum possible in each cohort. More than six classes would imply high heterogeneity in health patterns but also small and meaningless classes. The optimal number of classes was selected by looking at the Akaike and Bayesian Information Criteria (AIC and BIC respectively) but also considering the health patterns and size of each class. Once the optimal number of classes in each cohort was obtained, each centenarian was assigned to a single health class. Then, based on their ages at death, we computed survival curves and the associated 95% confidence intervals by health class and by cohort using the Kaplan-Meier estimator. We assess whether there are differences in survival among the different classes by computing the log-rank test.
Finally, we estimated the area under the curve (AUC) to test the ability of health classes to predict the chance of surviving to the frontier of survival. The AUC ranges from 0 to 1; a higher AUC implies a better prediction (36). We define the frontier of survival (2,45) as the 95th percentile of the centenarian age-at-death distribution. Note that such ages change across cohorts according to mortality improvements. In Table 1 we show such ages and values for the AUC calculated for different percentiles.
 The use of statistical imputation techniques like mean substitution or multiple imputation was avoided because these procedures might bias the results of the Latent Class Analysis and make comparisons among cohorts more uncertain. Therefore, we performed the analysis considering only the individuals that have complete values.