2.1. Data source and study design
This study used data from the NHIS-HEALS (NHIS-2020-2-089) 2002–2015. This database is a cohort of participants who underwent health screening provided by the NHIS. In South Korea, insurants aged ≥40 years are eligible for general national health screening biennially. From this health screening examination, the database contains information on demographic variables, health behaviors, laboratory results, records of inpatient and outpatient clinic use, and prescription records. The national healthcare claims database managed by the government, such as NHIS-HEALS, covers approximately 98% of the population of South Korea and can be used for research. The NHIS provides its database for research purposes, such as epidemiological studies, and its validity has been described in detail elsewhere [20, 21]. This population-based cohort study was approved by the Institutional Review Board of Seoul National University Hospital (IRB number: E-2104-195-1214). The requirement for informed consent from the participants was waived by the Institutional Review Board of Seoul National University Hospital as the NHIS-HEALS database is anonymized according to strict confidentiality guidelines. This study adhered to the principles of the Declaration of Helsinki and all methods were carried out in accordance with relevant guidelines and regulations.
2.2. Study population
The NHIS-HEALS database provides health screening data for individuals aged 40–79 years in 2002 [19]. We identified 334,626 participants who underwent health screening in 2002–2005 and excluded 858 participants who died before the index date of January 1, 2006, 50,032 participants diagnosed with DM before the index date, 1,169 participants who had already been prescribed antidiabetic medication, 9,174 participants with fasting blood sugar (FBS) levels ≥126 mg/dL, 12,204 participants diagnosed with cancer, and 30,127 participants diagnosed with cardiovascular disease (CVD) before the index date to correct for other possible reasons for antibiotic use such as underlying infection in immunocompromised patients. Moreover, we excluded participants with missing variables. Finally, the analysis included data from 201,459 participants (Figure 1).
2.3. Key variables
2.3.1 Main outcome variable
The main outcome was a new diagnosis of DM between 2006 and 2015. DM incidence was identified as newly appearing International Classification of Diseases, 10th revision (ICD-10) codes (E10-E14) and the prescription of antidiabetic medication. Person-years and adjusted hazard ratios (HRs) were calculated based on the first date of DM diagnosis.
2.3.2 Ascertainment of antibiotic use
The exposure variables were the numbers of cumulative days of antibiotic prescription and antibiotic classes from 2002 to 2005. Antibiotic classes were determined using the claim database and defined according to the World Health Organization (WHO) Anatomical Therapeutic Chemical (ATC) classification of drugs as macrolides, penicillin, cephalosporin, fluoroquinolones, sulfonamides, lincosamides, tetracyclines, and others [22-24] (Supplementary Table 1) The number of cumulative days of antibiotic prescription was categorized into 0, 1–29, 30–89, and 90 or more days. The number of antibiotic classes was categorized as 0, 1, 2, 3, 4, and 5 or more. The antibiotic non-user or the lowest antibiotic user groups were used as the reference groups for analyses [12]. When adjusting for the indications for antibiotic use, namely, the infection sources, the lowest antibiotic user group was used as the reference to avoid multicollinearity.
2.3.3 Ascertainment of covariates
The considered covariates included age (continuous, years), sex (categorical, men and women), BMI (categorical, <18.5, 18.5–22.9, 23–24.9, and ≥25 kg/m2), smoking (categorical, never smoker, past smoker, and current smoker), alcohol consumption (categorical, none, ≤2, and ≥3 times weekly), physical activity (categorical, none, 1–4, and 5–7 times per week), household income (categorical, first, second, third, and fourth quartiles), residence (categorical, capital, metropolitan, and rural), family history of DM (categorical, yes and no), Charlson comorbidity index (CCI, categorical, 0, 1–2, and ≥3), FBS (continuous, mg/dL), total cholesterol (continuous, mg/dL), acid suppressant use (categorical, yes and no), and infectious diseases (categorical, yes and no). Infectious diseases include respiratory diseases; urinary tract infections; skin, soft tissue, bone, and joint infections (SSTBJ); and intra-abdominal infections (Supplementary Table 2).
BMI was calculated by dividing the weight in kilograms by the squared value of height in meters and was categorized as underweight, normal, overweight, or obese (<18.5, 18.5–22.9, 23–24.9, and ≥25 kg/m2, respectively) based on the Asian Pacific criteria of the WHO [25]. Household income was derived from the insurance premiums. The first and fourth quartiles represented the lowest and highest household income, respectively. The CCI, which includes comorbidities such as chronic obstructive pulmonary disease and asthma, was used to consider comorbidities from claims data [26]. The use of acid suppressants, defined as histamine-2-receptor antagonists and proton pump inhibitors, were also recorded in the claims database.
The reasons for antibiotic use; in other words, the indications for antibiotic use, were considered as covariates. Five systems were considered: respiratory diseases, urinary tract infections, SSTBJ, intra-abdominal infections, and others (Supplementary Table 2) [27-29]. The major and most widely prevalent infection diagnoses were considered for each system. These sources of infections were considered to account for major confounding factors as DM can trigger infection and lead to the use of antibiotics.
2.4 Statistical analysis
Multivariate Cox proportional hazards models were used to estimate the adjusted hazard ratios (aHRs) and 95% confidence intervals (CIs) of DM according to antibiotic use after adjusting for all the covariates described above. For analyses, antibiotic non-users were the reference group to assess the risk of the cumulative number of days of antibiotic prescription. Furthermore, the group that used only one class of antibiotics was set as a reference for the analysis of antibiotic class number, as the fully adjusted models considered the source of infection as a covariate. In addition, washout periods of 1, 2, and 3 years were applied to the subjects to minimize protopathic bias [30]. Furthermore, we conducted stratified subgroup analyses for all covariates. All data mining and analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA). Statistical significance was defined as a P<0.05.