Study design and setting
We conducted both cross-sectional (for clustering at baseline) and cohort analyses (for the association between clinical subtypes and all-cause mortality and deterioration of care-need levels) using linked medical insurance claims data and survey data for care-need certification in a city in Japan (“main analysis”). In addition, we examined whether our clustering approach was replicable in another city in Japan (“validation analysis”).
Ethical approval
This study was conducted according to the principles of the Declaration of Helsinki. The study was approved, in which the informed consent was waived due to the anonymous nature of the data, by the Ethics Committee, Institute of Medicine, University of Tsukuba (approval number: 1445-14).
Data source
For the main analysis, medical claims data (April 2014 to March 2019), survey data for care-need certification (February 2012 to June 2019), and insurance registration data (October 2014 to March 2021) of enrollees of the Late-stage Elderly Health System or National Health Insurance were obtained from the municipal government of Tsukuba City, Ibaraki Prefecture, Japan. Tsukuba City is a large region with a population of 240,383 people, including 46,613 (19.4%) people aged >65 years [18].
For the validation analysis, medical claims data (May 2012 to October 2016) and survey data for care-need certification (May 2012 to October 2016) of enrollees of the Late-stage Elderly Health System or National Health Insurance were obtained from the municipal government in Sammu City, Chiba Prefecture, Japan. The city has a population of 48,444, including 17,329 (35.8%) people aged ≥65 years [19]. We conducted only the cross-sectional analysis in Sammu City because the follow-up data were not sufficiently available.
The details of the Japanese medical and long-term care insurance claims data are described elsewhere [20-22]. Briefly, the medical insurance claims include medical service fees, dates of receiving medical services, diseases according to the ICD-10 codes, tests conducted, and drugs prescribed. Under the long-term care insurance system, people are certified through a standardized process involving an assessment of physical and cognitive functions [23] and are categorized into seven grades, as follows: support levels 1 and 2 and care levels from 1 to 5, which are well correlated with the Barthel Index (r = -0.70) [24]. They receive periodic reassessments of care-need levels, with intervals ranging from 3 to 24 months during our study period [25].
We integrated the medical insurance claims data and survey data for care-need certification and the insurance registration data in the case of Tsukuba City, using the pseudo-ID provided by the municipality.
Study population
In Tsukuba City, we identified people receiving the survey for long-term care needs certification for the first time between October 2014 (ensuring a 6-month look-back period after April 2014 for defining comorbidities from medical claims) and March 2019 (see Supplementary Fig. S3 online). In this study, we included those with care levels ranging from 1 to 5, as individuals with support levels often do not require actual long-term care and instead receive preventive care to forestall the need for long-term care. We excluded individuals aged <65 years and those who died before starting the actual long-term care services.
In Sammu City, for the validation analysis, we identified the participants between May 2013 and October 2016 and applied the same exclusion criteria.
Exposures (factors used for clustering)
According to the Ministry of Health, Labour, and Welfare in the Comprehensive Survey of Living Conditions in Japan, 22 diseases are considered to potentially affect the initiation of long-term care [4]. The list includes hemorrhagic stroke, ischemic stroke, other cerebrovascular diseases, ischemic heart disease, arrhythmia, heart failure, other cardiac diseases, cancer, chronic obstructive pulmonary disease, pneumonia, other lower respiratory tract diseases, rheumatoid arthritis, other arthropathies, dorsopathies, dementia, Parkinson's disease, insulin-dependent diabetes, non-insulin-dependent diabetes, visual impairment, hearing impairment, femur fractures, and other fractures.
We defined the presence or absence of each of the 22 diseases, which were recorded in medical claims during the past 6 months before the certification of long-term care (including the same month when the long-term care was certified). We utilized the ICD-10 codes used in previous studies (see Supplementary Table S2 online) [4].
Outcomes
In the cohort analysis in Tsukuba City, the outcomes of interest were all-cause mortality and deterioration of care need levels within 2 years. Data on deaths were obtained from insurance registration, a source theoretically capable of capturing all fatalities up to March 2021. Information on care need levels was derived from the survey data for care-need certification. Specifically, we retrieved the certified care-need levels of participants at or closest to the time point 2 years (i.e., 24 months) from the date of initial certification. In the primary analysis, deterioration of care-need levels was defined as an increase in the level of the latest certification compared with the initial level, or death within 24 months.
Statistical analyses
We summarized the characteristics of the study participants using frequencies (percentages) for categorical variables and medians (IQRs) for continuous variables. The prevalence of each disease and the number of comorbidities (among 22 comorbidities) per person were calculated. We constructed a network plot to identify the relationship between each comorbidity.
For clustering, after conducting dimensionality reduction through multiple correspondence analysis, we employed a fuzzy c-means clustering algorithm, allowing individuals to belong to more than one cluster. We employed the elbow method and the Xie-Beni index (optimal when presenting low values) [26] to obtain the optimal cluster number, with various cluster numbers tested. We repeated the fuzzy c-means clustering algorithm 100 times to account for the random nature of the cluster solutions, and an average outcome was generated. Participants were assigned to the cluster with the highest membership probability. The observed/expected ratios of diseases and disease exclusivity were used to describe the disease patterns of the clusters [7,27]. Observed/expected ratios were calculated by dividing the disease prevalence in the cluster by the disease prevalence in the overall population. Disease exclusivity was calculated as the number of participants with a specific disease in a cluster divided by the total number of individuals with that disease. Clusters were characterized based on diseases with an exclusivity of ≥25% or an observed/expected ratio of ≥2 [7,27]. Each cluster was named after discussions among the authors with clinical experience in treating older patients. Other basic characteristics of the clusters were compared using the Kruskal–Wallis test for continuous variables, such as age and the number of morbidities, and the chi-square test for categorical variables, such as sex and care need levels.
To validate the replicability of the clustering results, in Sammu City, we repeated the same clustering procedure and then compared the results with those obtained from Tsukuba City.
In the cohort analysis in Tsukuba City, Kaplan–Meier survival curves were plotted for death. Additionally, we conducted a Cox proportional hazards analysis for death, with adjustments for age, sex, and initial care-need level. The follow-up continued until the incidence of death, loss to follow-up (defined as removal from the insurance registration for reasons such as moving), or at the end of observation in March 2021.
Regarding the deterioration of care-need levels within 2 years, the analysis was restricted to participants who started long-term care before June 2017, so that all the participants (except for those with loss to follow-up) had follow-up information for 2 years. We made a table displaying the number of participants with improved, stable, and deteriorated care-need levels and those with no reassessment, death, or loss to follow-up at 2 years. Excluding those lost to follow-up at 2 years, univariable and multivariable logistic regression analyses were conducted for the deterioration of care-need level (i.e., an increase in the level of the latest certification compared with the initial level or death), with adjustment for age, sex, and the initial care-need level. Care-need level of those with no reassessment was assumed to be unchanged. In the first sensitivity analysis, we excluded those with no reassessment by regarding them as lost to follow-up. In the second sensitivity analysis, individuals with death were further excluded from the analysis, focusing solely on the increase in the latest certification level compared with the initial level.
The significance level was set at p<0.05. Statistical analyses were performed using R Statistical Software (v4.3.2; R Core Team 2023) and Stata (version 17; StataCorp LLC 2021).