Data collection and study sample
Data originated from two population-based cross-sectional KORA-Age study arms: KORA-Age 1 conducted from 01.12.2008 to 06.11.2009 and KORA-Age 3 conducted from 01.02.2016 to 07.10.2016, which are follow-ups of four independent cross-sectional studies (S1 1984/5, S2 1989/90, S3 1994/5, and S4 1999- 2001). Both studies focused on the health of participants aged 65 and older in Augsburg and the two adjacent regions in Southern Germany based on questionnaires and telephone interviews. The KORA-Age study is described in detail elsewhere . In short, 4,565 out of 5,991 eligible individuals (response rate: 76.2%) participated in the survey, and 1,920 were between 65 and 71 year-old on 31.12.2008 (born between 1937-1943). The KORA-Age 3 survey consisted of 4,083 out of 6,051 eligible individuals (response rate: 67.5%), of whom 1,444 participants were in the same age range on 31.12.2015 (born between 1944-1950). The present analysis combines these two KORA-Age study arms, which were completely independent because of the different birth years (Figure 1).
Exposure: Birth phases
The exposure variable was defined based on the participants’ crucial developmental age during prenatal gestation or the first two years of life. Based on the World War II situation in the Augsburg area, individuals were divided into five independent birth periods [Figure 1]: phase 1 (pre-war, January 1937-August 1939), phase 2 (early war, September 1939-December 1943), phase 3 (late war, January 1944-April 1945), phase 4 (famine period after the war, May 1945- Jun 1948) and phase 5 (after famine period and reconstruction, July 1948-Dec 1950) .
The primary outcome was MM, defined as the presence of two or more concomitant chronic diseases in individuals . We considered fourteen major chronic diseases, including hypertension, eye disease, heart diseases, diabetes, joint disease, lung disease, gastrointestinal disease, stroke, cancer, kidney diseases, liver diseases, neurological diseases, depression, and anxiety. All disease variables were defined as life-time diagnoses except for cancer diagnosed within the last three years. Hypertension, diabetes, cancer, stroke, heart diseases (myocardial infarction and coronary artery disease) were assessed based on the questionnaire. All other diseases were identified in a telephone interview based on the Charlson Comorbidity Index . Participants were asked whether they suffer from kidney, liver, lung diseases (e.g., asthma, chronic bronchitis, and emphysema), inflammatory joint problems (e.g., arthritis or rheumatism), gastrointestinal diseases (e.g., colitis, cholecystic, gastric, or ulcer), heart diseases (e.g., congestive heart failure, coronary heart failure, or angina), eye problem (e.g., cataract, retinitis pigmentosa, glaucoma, macular degeneration, diabetic retinopathy). Neurological diseases were evaluated based on diseases like epilepsy, Parkinson's, or sclerosis using telephone interviews. The Geriatric Depression Scale  and Generalized Anxiety Disorder Scale-7  were used to diagnose depression and anxiety. Individuals with scores ≥ 10 were defined as suffering from depression or anxiety.
We considered age, sex, education level, alcohol consumption, physical activity, body mass index (BMI), smoking behavior, and cognitive status as covariates. Education levels were based on the duration of education and vocational training and categorized into three groups: low (9 years or less), middle (10 or 11 years), and high (12 years or more). Body mass index (BMI), defined by the World Health Organization (WHO), was used. Participants were categorized as underweight or normal weight (BMI < 24.99 kg/m2), overweight (25 ≤ BMI ≤ 29.99), obese class I (30 ≤ BMI < 34.99), and obese class II or III (35 ≤ BMI) .
Leisure time physical activity was measured from two separate questions about leisure time sports activity in winter and summer, including cycling. Possible answers were (1) > 2 hours, (2) 1–2 hours, (3) <1 hour and (4) none. Participants, who had a total score less than 5, obtained by summing the numbers (1)–(4) relating to activities in winter and summer, were classified to be “physically active” .
Alcohol consumption was based on self-reported alcohol intake with the following five groups: ‘‘almost every day’’, ‘‘several times a week’’, ‘‘about once a week’’, ‘‘less than once a week’’, and “never or seldom” . For our analysis, we categorized the first two groups as “daily use” and the last two as “never or rare use.” Based on self-reported information, there are three categories for smoking status: never smokers, former smokers, and active smokers. The cognitive status is identified as dementia, mildly impaired cognitive status, and normal status based on TICS-M (Telephone Interview for Cognitive Status) score, a standard instrument for assessing cognitive impairment .
The frequency and prevalence of baseline characteristics were stratified by sex and birth phases, and the Chi-squared test was computed to check the differences. Overall, the stratified prevalence of MM and single diseases was calculated and tested by the Chi-squared test. Then Post-hoc tests with Bonferroni adjustment were performed for multiple comparisons. Covariates multi-collinearity was assessed using the variance inflation factors (VIF). Associations of the birth phases and MM were estimated by odds ratios (OR) in logistic regression models with different adjustment steps for risk factors. The Modeling process started with the standardized age variable as covariate only (model 1), then birth period only (model 2), standardized age (rescaled with mean and standard deviation) and birth period together (model 3), then sex, education, alcohol use, physical activity, BMI, smoking behavior and cognitive status were added to the final model (models 4). We used a standardized age variable to prevent potential bias because of different age distributions in the different birth phases. The interaction effect of sex and birth phase variable also was checked in the final model. Agglomerative hierarchical clustering approach as an unsupervised machine learning technique was carried out to identify disease clusters so that diseases in one cluster are more similar than diseases in other clusters. This bottom-up algorithm begins with each disease as an individual cluster and merges the similar clusters until remaining only one cluster based on the proximity distance matrix. The average linkage method as proximity distance and Yule Q coefficient as similarity measurement for the binary disease variables were considered. The final cluster selection was created based on the threshold (cutoff height), corresponding to subject information, prior research, and clinical significance.
Since there is a big difference between the prevalence of hypertension and other chronic diseases, regression models and cluster analysis were performed without hypertension as a sensitivity analysis. Furthermore, the Ward and Single linkage methods, as other possible determinants for the pairwise distance between the set of observations were used to determine the robustness of agglomerative hierarchical clustering .