Data Source and Setting
In this cohort study, we used data from the National Health Insurance Service (NHIS), which is a universal social insurance program that covers 97% of the Korean population (approximately 50 million people). The NHIS dataset includes information on demographic variables (age, sex, etc.), socioeconomic status (income level, residential area, etc.), healthcare utilization (outpatient department, emergency room visit, hospitalization, etc.), health screening examination findings, disease diagnosis based on the International Classification of Disease (ICD-10) codes (10th revision), medical treatment, procedures, and surgery [20]. The NHIS database includes various medical and health information and has been widely used in epidemiological studies to identify risk factors for certain diseases [21–23].
In South Korea, annual or biennial free health screening examination programs are offered to all Korean citizens by the Ministry of Health and Welfare. In 2009, the health screening examination included anthropometric measurements, such as BMI; questionnaires pertaining to smoking, alcohol consumption, and physical activity; blood tests including lipid levels; and chest radiography. The current participation rate in health screening examinations ranges from 70–80%. After anonymization, the Korean government provides representative data from health screening examinations for enabling research [24].
The study protocol was approved by the Institutional Review Board of the Samsung Medical Center (IRB No. SMC 2022-06-141). The requirement for informed consent was waived because the NHIS database uses a deidentified patient identification system.
Study Participants
Among patients diagnosed with RA between 2010 and 2017 who were eligible for the study, we identified 119,788 with RA (83,064 with seropositive RA [SPRA] and 36,724 seronegative RA [SNRA]) using the following criteria: (1) individuals who had a registered diagnostic code for RA (ICD-10 M05 for SPRA and M06, except M06.1 and M06.4, for SNRA) and (2) those who had been prescribed any DMARD, including conventional synthetic DMARDs, bDMARDs, and target-specific DMARDs (tsDMARDs).
We initially included 64,457 participants (45,045 with SPRA and 19,412 with SNRA) who were diagnosed with RA and whose health screening examination data within 2 years preceding the RA diagnosis (between 2010 and 2017) were available. After excluding individuals with other connective tissue diseases (CTD; n = 213), those with missing data of health screening examination (n = 2,321), those who were younger than 20 years (n = 6), those who were previously diagnosed with NTM (n = 136) or diagnosed with NTM within 1 year after RA diagnosis (n = 569), to minimize the risk of reverse causality, a total of 61,212 potential participants were identified for the RA cohort. Of these, 60,315 participants (42,062 with SPRA and 18,253 with SNRA) were eligible for 1:5 age and sex matching.
To establish age- and sex-matched controls, from among 1,207,831 subjects who were approximately 1:10 age- and sex-matched to the 119,788 patients with RA, we included 677,322 participants who underwent health screening examinations in the same year as the matched participants with RA. After excluding participants with other rheumatic diseases (n = 20), those with missing data on health screening examinations (n = 30,705), those younger than 20 years (n = 706), those diagnosed with previous NTM-PD before matching (n = 379), and those diagnosed with NTM-PD within 1 year after matching, there were 643,122 participants in the matched controls. Of these, 301,575 participants were eligible for 1:5 age and sex matching with the RA cohort (Fig. 1).
Exposure
The exposure in this study was RA, which included SPRA and SNRA. Separate operational definitions were applied to each group to identify patients with SPRA and SNRA in each group [25]. The NHIS operates the Rare and Intractable Disease (RID) program for patients with certain diseases and provides cost-reductive actions for relevant medical expenses related to these diseases. For participants with RA, SPRA is only eligible for registration in the RID program when the following criteria are satisfied: a positive result for rheumatoid factor or anti-cyclic citrullinated peptide antibody and an official physician’s certificate that the patient meets the RA classification criteria. Participants with SPRA were defined based on whether their claim record included the ICD-10 diagnostic code M05, the RID registration code V223, and a record of prescriptions for any DMARDs, including conventional synthetic DMARDs (methotrexate, hydroxychloroquine, leflunomide, sulfasalazine, tacrolimus, cyclosporine, D-penicillamine, bucillamine, azathioprine, minocycline, or mizoribine), bDMARDs (adalimumab, etanercept, infliximab, golimumab, rituximab, abatacept, tocilizumab), or tsDMARDs (tofacitinib) for at least 180 days. For SNRA, participants who visited hospitals with diagnostic codes of ICD-10 M06 (except for M06.1 and M06.4) and had a prescription record of DMARDs for ≥ 180 days were defined as participants with SNRA [25]. The index date was defined as the date on which the RA-related diagnostic code was first registered.
Outcomes
The outcome of this study was the incidence of NTM-PD, which was defined by the following criteria: (1) newly claimed ICD-10 diagnosis code A31.0; and (2) at least 2 ambulatory visits or hospitalizations with an A31.0, diagnosis code within 1 year after the initial claim [26]. The participants were followed up from 1 year after the RA diagnosis (or the corresponding index date for matched controls) to the date of NTM-PD diagnosis, censored date, or December 31, 2019, whichever occurred first.
Covariates
Household income was categorized into quartiles based on insurance premium levels, which were determined by income level, and participants covered by Medical Aid (poorest 3%) were merged into the lowest income quartile [27–29] and designated “low income.” Personal behaviors, including smoking status, alcohol consumption, and physical activity, were assessed using a self-reported questionnaire. Smoking status was divided into never, ex-, and current smokers. Ex-smokers and current smokers were assigned to subgroups based on 20 pack-years (PY). Alcohol consumption was classified as none, 1–2 times a week, 3–4 times a week, or almost every day. “Regular exercise” was defined as moderate-intensity exercise for > 5 days per week or vigorous exercise for > 3 days per week [30]. BMI was calculated as body weight divided by the square of height (kg/m2) and classified into one of the following four groups: underweight (< 18.5 kg/m2), normal (18.5–22.9 kg/m2), overweight (23.0–24.9 kg/m2), and obese (≥ 25 kg/m2) according to the classification for Asians [31]. The definitions of comorbidities (diabetes mellitus, hypertension, dyslipidemia, chronic kidney disease, ischemic heart disease, and airway diseases [asthma, chronic obstructive pulmonary disease, or bronchiectasis]) were based on ICD-10 codes, as previously described [28, 29, 32, 33]. Additionally, tuberculosis was defined using the ICD-10 codes and registered with the national RID support program [28, 29].
Statistical analysis
Descriptive statistics are presented as the frequency (proportion) for categorical variables and mean ± standard deviation (SD) for continuous variables. We compared the two groups using the chi-square test for categorical variables, and the t-test for continuous variables. The incidence rates of NTM-PD were calculated by dividing the number of incident events by the total follow-up period (1,000 person-years). A cumulative incidence plot was used to estimate the incidence of NTM-PD between the RA and matched cohorts, and the log-rank test was used to evaluate significant differences between groups.
The risk of incident NTM-PD in the RA cohort compared to that in the matched cohort was estimated using univariate and multivariate Cox proportional hazards regression analyses. Model 1 was adjusted for age, sex, income, smoking, alcohol consumption, physical activity, and BMI. Model 2 was further adjusted for diabetes mellitus, hypertension, dyslipidemia, chronic kidney disease, airway diseases, and tuberculosis. Stratified analyses were performed according to sex, age, income, smoking, alcohol consumption, regular exercise, BMI, and comorbidities, including airway diseases and tuberculosis. Additionally, all analyses were performed in equally divided groups according to RA serologic status. Statistical significance was defined as a two-sided P-value of < 0.05. All the statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA).