Extraction of nationwide population data
In the Republic of Korea, the Korean National Health Insurance Service (NHIS) requires mandatory registration for the entire population. These compulsory subscribers pay for health insurance based on their income level [16]. All medical institutions present claims on diseases diagnosed using the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) codes, World Health Organization (WHO) [16, 17]. Rare incurable diseases (RIDs) including human immunodeficiency virus-1 (HIV-1) infection, CMV end-organ diseases, cancer, and moderate-to-severe dementia are controlled by the Korean NHIS [14-16, 18]. Patients with RIDs receive significant health coverage by paying 5% of their healthcare costs. Therefore, the process of diagnosing RIDs and receiving medical claims should reflect accurate data [14, 16, 18]. The Korean Health Insurance Review and Assessment (HIRA) plays an important role in inspecting this process. We used a medical dataset extracted from the National Health Insurance Database (NHID), which was submitted to the HIRA. The NHID includes big data comprising 1.3 trillion records with information regarding medical diagnosis, treatment results, long-term care insurance for elderly patients, registration information on RIDs, and the status of medical institutions [14, 18]. This study was approved by the Institutional Review Board of the Gangnam Severance Hospital, Yonsei University College of Medicine. A waiver of informed consent and relevant permission forms were obtained from the National Health Insurance Sharing Service.
Study population and design
From January 2010 to December 2014, the data from 1,557 enrolled patients diagnosed with CMV tissue-invasive end-organ diseases and RID were extracted from the NHID. A 4-year washout period was considered (between January 2006 and December 2009) to evaluate the effect of CMV disease on new-onset moderate-to-severe dementia. Based on unique RID codes linked to ICD-10 codes at retrospective enrolment, we did not find any subjects with moderate-to-severe dementia in either group. Twenty-one patients diagnosed with moderate-to-severe dementia during the washout period and twenty-four HIV-1-infected individuals were excluded. Among the remaining 1,512 patients, 687 with CMV tissue-invasive diseases aged ≥40 years were selected for the group of exposed individuals with CMV diseases. For the case-control cohort study, 3,435 age- and sex-matched individuals were selected for the unexposed with no CMV diseases group at a 1:5 ratio. Our study design was not a retrospective matched-pairs analysis using a propensity score model. Participants from both groups were followed up until December 2016. The incidence rate (IR) per 1,000 person-years was calculated by dividing the number of events (new-onset of moderate-to-severe dementia) with the follow-up duration of participants. The person-years in participants without an event of new-onset dementia was calculated from the follow-up time between enrolment and the date of follow-up end or death.
Definition
A diagnosis of CMV tissue-invasive end-organ diseases can be established if the following are observed: (1) histopathologic features, including the presence of inclusion bodies and a positive finding in immunohistochemical staining for CMV; or (2) detection of CMV itself or its DNA in body fluids or tissue using the pp65 antigen measurement, culture, or nucleic acid amplification test [19, 20]. Moderate-to-severe dementia was diagnosed only by neurologists or phycologists according to the essential diagnostic methods and criteria by the Korean NHIS: (1) abnormal brain imaging finding (computed tomography or magnetic resonance imaging or fluorodeoxyglucose-positron emission tomography); and (2) ≥ two points in clinical dementia rating or ≥ five points in global deterioration scale and ≤ 18 points in mini-mental status examination; and/or (3) abnormal neuropsychological test (Seoul neurophysiological screening battery or consortium to establish a registry for Alzheimer’s disease assessment pocket or literacy-independent cognitive assessment) [21, 22].
CMV diseases are given a unique V104 code for RID registration. The V104 code is consistent with the specific B25 codes in the ICD-10, including those for all types of CMV tissue-invasive end-organ diseases, such as cytomegaloviral pneumonitis (B25.0), cytomegaloviral hepatitis (B25.1), cytomegaloviral pancreatitis (B25.2), other cytomegaloviral diseases (B25.8), and cytomegaloviral diseases, unspecified (B25.9) [17]. This did not include the codes for congenital CMV infection (P35.1) or cytomegaloviral mononucleosis (B27.1) [14, 17]. Furthermore, a moderate-to-severe dementia diagnosis for RID registration using a unique V800 and V810 code was consistent with the ICD-10 codes: AD (early or presenile onset or type 2 [F00.0/G30.0], late or senile onset or type 1 [F00.1/G30.1], atypical or mixed type [F00.2/G30.8], and unspecified [F00.9/G30.9]), VaD (acute onset [F01.0], multi-infarct or predominantly cortical dementia [F01.1], subcortical [F01.2], mixed cortical and subcortical [F01.3], and other or unspecified [F01.8 or F01.9]), and other dementia (dementia in Pick disease [F02.0/G31.0], in Creutzfeldt-Jakob disease [F02.1/A81.0], in Huntington disease [F/02.2/G10], dementia in Parkinson disease [F02.3/G20], with Lewy bodies disease [F02.8/G31.8] and frontotemporal dementia [G31.0]) (Supplementary Table 1) [17, 23, 24].
The solid organ transplantation (SOT) recipients had V005 (kidney), V013 (liver), V015 (heart), and/or V277 (lung) codes according to the transplant organ for RID registration, which are consistent with the Z94 codes in the ICD-10 (Z94.0, Z94.1, Z94.2, Z94.3, and Z94.4 for the kidney, heart, lung, heart and lung, and liver, respectively). End-stage renal disease (ESRD) on dialysis was identified using V001 and V003 RID codes, identical to ICD-10 N18.5 code (chronic kidney disease, stage 5). HIV-1-infected individuals were identified using the V103 RID code in compliance with ICD-10 B20-B24 codes [17]. In addition, hypertension (I10-I13, I15), type 2 non-insulin-dependent diabetes mellitus (NIDDM) (E11), dyslipidaemia (E78), malignant neoplasms including haematologic malignancies and excluding in situ neoplasms (C00-C86.6, C88, C90-C97, and D37-48), and haematopoietic stem cell transplantation (HSCT) recipients (Z94.8) were identified using ICD-10 codes [17].
Body mass index (BMI) was calculated as weight/height × height (kg/m2) and categorised as <25 and ≥25 kg/m2. Low income status was defined as the lower 25th percentile of annual household income, based on data from the 2010 South Korean Population and Housing Census [14].
Statistical analysis
Categorical and continuous data were presented as numbers (percent) and mean ± standard deviation, respectively. The exposed and unexposed groups were compared using the χ2 and independent t-tests. Kaplan-Meier curves adjusted for age and sex were used to analyse the incidence and probability of dementia according to the presence of CMV diseases. In survival analyses, the time to diagnosis was determined as events indicating diagnostic time, to the time of new-onset moderate-to-severe dementia. Censored data were determined as death or follow-up end prior to the event, that mainly occurred owing to late enrolment in the cohort. Our study did not have type 2 censoring by loss to follow-up or drop out. The proportion of censored data in the exposed and unexposed group was similar (1.18-fold, 14.6% vs. 12.4%, respectively). Multivariate logistic regression analyses using model 1 (M1, non-adjusted), model 2 (M2, adjusted for age, sex, low income status, and BMI), and model 3 (M3, adjusted for age, sex, low income status, BMI, SOT and/or HSCT recipients, malignant neoplasms, ESRD on dialysis, NIDDM, hypertension, and dyslipidaemia) were performed to evaluate the impact of CMV diseases on moderate-to-severe dementia development. Statistical analyses were performed using the Statistical Analysis System (SAS) program (version 9.2; SAS Institute, Cary, NC). Two-tailed P values<0.05 were considered significant.