Development and validation of a risk-prediction nomogram for chronic low back pain using a national health examination survey

Background Several prognostic factors for chronic low back pain (CLBP) have been reported. However, there is no study regarding the prediction of CLBP development in general population, using a risk prediction model. Based on this background, the aims of this study were: (1) to develop and validate a risk prediction model for CLBP (chronic low back pain) development in the general population, and (2) to create a nomogram which can help a person at risk of developing CLBP to receive appropriate counseling on risk modication. Methods Data on CLBP development, demographics, socioeconomic history, and comorbid health condition of participants were obtained through a nationally representative health examination and survey from 2007 to 2009. Prediction models for CLBP development were derived for health survey on a random sample of 80% of the data and were validated in the remaining 20%. After developing the risk prediction model for CLBP development, this model was incorporated into a nomogram. Results Data for 17,038 participants were nally analyzed, including 2,693 with CLBP and 14,345 without. The nally selected risk factors included age, gender, occupation, education level, mid-intensity physical activity, depressive symptom, and comorbidities. This model had good predictive performance in the validation dataset (concordance statistic = 0.7569, Hosmer-Lemeshow chi-square statistic = 12.10, p =.278). The ndings indicated no signicant differences between the observed probability and predicted probability according to our model. Conclusions The risk prediction model, presented by a nomogram, which is a score-based prediction system, could be incorporated into the clinical setting. Thus, our prediction model with a nomogram can help a person at risk of developing CLBP to receive appropriate counseling on risk modication from primary physicians.

community-based setting, it is important to identify the risk factors of CLBP and also predict the probability of patients likely to develop CLBP. However, there is no study on the prediction probability of CLBP development in the general population using a risk prediction model.
Based on this background, the aims of this study were: (1) to develop and validate a risk prediction model for CLBP development, and (2) to create a shared decision-making nomogram that can help a patient at risk of developing CLBP to receive appropriate counseling on risk modi cation using a nationally representative sample of Korean adults.

Study participants and design
Data from version IV-1, 2, and 3 of the Korea National Health and Nutrition Examination Survey (KNHANES) performed in 2007, 2008, and 2009 were analyzed. This survey has been conducted annually since 1998 by the Korea Centers for Disease Control (KCDC). To evaluate the health and nutritional status of the general Korean population, a nationwide sampling method (clustered, multistage, strati ed, and randomized) is used for proportional distribution according to geographic area, sex, and age. The survey participants are different every year, and are not monitored serially, resulting in a random sampling every year. The KNHANES evaluates three aspects: health questionnaires, health and physical examinations, and nutrition questionnaires that are administered by experienced interviewers, registered nurses, and laboratory technicians [24]. KNHANES IV-1 (2007), IV-2 (2008), and IV-3 (2009) examinations and health surveys were completed by 4594, 9744, and 10533 participants (total: 24,871 participants), respectively. The present analysis was con ned to 17038 respondents aged 10-100 years who answered the chronic LBP examination survey and had no missing data regarding the demographics and health questionnaires (Fig. 1).

De nitions of chronic low back pain
Participants in this survey who answered "yes" to all three questionnaires were de ned as having CLBP: (1) "Have you ever had LBP for the rest of your life?" (2) "Do you currently have LBP?" and (3) "Have you complained of LBP for more than 90 days during the past year?"

Description of demographics and health surveys
We analyzed the participants' demographics and socioeconomic status, comorbidities, and lifestyle habits through health interviews and examinations. All participants were asked whether they had received diagnoses of major comorbidities by physicians, such as hypertension, diabetes mellitus, dyslipidemia, ischemic heart disease (myocardial infarction, angina), stroke, liver cirrhosis, asthma, chronic obstructive pulmonary disease, arthritis, and chronic kidney disease major cancers (lung, stomach, liver, colon, breast, prostate or uterine cervical), were surveyed in all participants.
The age was categorized into age groups. Body mass index (BMI) was calculated as the body weight (kilogram) divided by the height (meter) squared and categorized into underweight (<18.5 kg/m 2 ), normal weight (18.5-24.9 kg/m 2 ), and obese (≥25.0 kg/m 2 ). Patients were categorized as non/ex-smoker or current smoker based on the present smoking status. Alcohol consumption was categorized into none and ≥1 drink/month. Occupations were divided into ve groups: unemployed (e.g., student, housewife), o ce workers (e.g., managers, professionals, and o ce workers), sales and services, machine tting and simple labor (e.g., technicians, device and machine operators, and low-level laborers), and agriculture, forestry and shery [25]. The household income level was categorized into 4 groups according to quartiles. The educational level was also categorized into four groups according to the degree of graduation: ≤6 years (elementary school), 7-9 years (middle school), 10-12 years (high school), and ≥13 years (university or college). Physical activity was de ned as three categories. First, walk was de ned as walking activity for 5 or more days per week at least 30 minutes. Moderate physical activity was de ned as mid-intensity physical activity for 5 or more days per week for at least 30 minutes. High physical activity was de ned as high-intensity physical activity for 3 or more days per week for at least 20 minutes. Depressive symptom was de ned as individuals in this survey who felt sad or had a depressive symptom for 2 consecutive weeks during the past year.

Statistical analysis
Statistics were analyzed by Stata/MP 15.0 (StataCorp., 2017, Stata Statistical Software: Release 15; College Station, TX, USA; StataCorp LP). Continuous variables were presented as mean ± standard deviation. Statistical signi cance was considered with two-tailed p-value <0.05. Sampling weights were applied to the study population in order to represent the Korean population without bias.
General demographics and co-variables were evaluated between the participants with and without CLBP.
Student's t-test was used to compare continuous variables, and chi-square test was used for categorical variables. Baseline demographics and co-variables listed in Table 1 were assessed as independent variables for the models.
To create a development and validation dataset from the entire datasets, we used the split-sample method [26]. A split sample with a 50% hold out results in models with suboptimal performance, that is, models with unstable and, on average, the same performance as that obtained with half the sample size [27]. Therefore, the development dataset for the prediction equation development was obtained by randomly selecting 80% of the entire datasets [28]. A logistic regression model was used for developing prediction equation. Only covariables with a p-value of <0.05 from univariate analysis were subsequently evaluated in the multiple logistic regression using backward stepwise selection with a 0.05 signi cant level. Table 2 shows the list of variables, odds ratios, and regression coe cients that remained in the nal prediction models after multiple logistic regression.
The developed model was validated by evaluating their performance with respect to discrimination and calibration using the C-statistics and Hosmer-Lemeshow chi-square statistics. The validation datasets for the prediction equation consisted of the remaining 20% of the full data after the development dataset was randomly selected. The area under the receiver operating characteristics curve (AUC), also called the C-statistic, for prediction model was measured for discrimination. Although it is controversial to determine the good value of the AUC, a value below 0.70 is often considered suboptimal, a value from 0.70 to 0.80 is considered good, and a value of 0.80 or above is excellent. To assess developed model calibration, Hosmer-Lemeshow chi-square statistics (HLS) was used to calculate how close the predicted risks were to the actual observed risks. To calculate the HLS, dataset was divided into 10 subgroups based on the predicted probabilities from the developed prediction model. Values exceeding 20 indicated a signi cant lack of calibration [29].
The risk of CLBP was predicted using a nomogram, which is a two-dimensional diagram designed to allow the approximate graphical computation of a mathematical function. The nomogram was generated using the independent risk factors analyzed in the multiple logistic regression analysis. To make the nomogram, we used the "nomolog" module of STATA to create the nomogram [30].

Demographics of participants according to chronic low back pain
The baseline general characteristics of all participants are shown in Table 1. A total of 17,038 participants were nally analyzed, including 2,693 with CLBP and 14,345 without. The prevalence of CLBP was 15.8% in Korean subjects, with a prevalence of 11.8% in men and 24.5% in women. Of these, 80% of the participants (n=13,630) were randomly selected to the development dataset, with the remaining 20% of the participants (n=3,408) selected to the validation dataset. In the development dataset, participants with CLBP was 2,120, and without CLBP was 11,510. The mean age was 49.1 ± 16.6 years, and 5,776 participants (42.4%) were men. In the validation dataset, participants with CLBP was 573, and without CLBP was 2,835. The mean age was 49.3 ± 16.6 years, and 1,426 participants (41.8%) were men.
Risk factors for prediction model Table 2 shows the estimated odds ratios (ORs) from multiple logistic regression in the development dataset. The nally selected risk factors included age, gender, occupation, education level, mid-intensity physical activity, depressive symptom, and comorbidities (stroke, ischemic heart disease, knee osteoarthritis, asthma, chronic obstructive pulmonary disease, cancer history). Based on this risk analysis, the CLBP prediction equation were calculated, and the coe cients of the risk factors were developed. From this equation, the linear function of developing prediction probability was estimated.

Discrimination and calibration of prediction model
Our prediction model had good discrimination (AUC = 0.7518) and were well calibrated (HLS = 4.72, p = .787) in the development dataset. Moreover, this model also had good validation in the validation dataset (AUC = 0.7569, HLS = 12.10, p = .278). This indicated no signi cant differences between the observed probability and predicted probability according to our model. Figures 2 and 3 showed the discrimination and calibration plots for the CLBP prediction model.

Nomogram for prediction model
The risk factors that were found to predict CLBP in the development dataset were incorporated into the nomogram, as shown in Figure 4. The value of each risk factor is respectively loaded on each variable axis (the 1st-12th lines), and a line is drawn downwards to determine the number of points received for each variable (the 13th line). Then, the sum of these numbers is located on the total points axis (the last line), and a line is drawn upwards to the risk probability axis to determine the likelihood of CLBP.

Discussion
Our study developed and validated a clinical risk prediction model for CLBP development in the general population using a cross-sectional Korean population-based health survey. It is necessary to create a risk prediction model to identify common risk factors of CLBP and to modify these risk factors, especially that there are increasing needs not in research setting, but in clinical setting. To our knowledge, this is the rst clinical risk prediction model for CLBP in the general population. We found that the prevalence of CLBP in the general population is 15.8% (11.8% in men and 24.5% in women). In the variables of demographics, medical histories, and socioeconomic status, we found signi cant risk factors, which in uence the development of CLBP. Using these risk factors, we developed a clinical risk prediction model and nomogram to allow personalized CLBP prediction based on personal characteristics. Our nal model had a good discrimination and calibration performance in the validation datasets, which demonstrated accurate prediction of CLBP in a new population with similar characteristics.
In our model, age was a key predictor of CLBP development. At 80 years of age, the odds ratio was increased to 7.268 which is compared to the 10 years of age (reference age). In a previous study, LBP prevalence is the highest in individuals aged 45 to 64 years [31]. However, in our study, the risk of CLBP increases with aging up to the 9thdecade of life. The increased risks in female gender is similar to a previous study [19]. However, our study revealed no signi cant difference in smokers and alcoholics compared with non-smokers and non-alcoholics [32]. Other prognostic factors, such as occupation, education level, physical activity, depressive symptoms, and comorbidities, were associated with CLBP, which is similar to previous studies [3, 6-11, 19, 22].
Previous studies investigated the prognostic factors for developing CLBP from LBP in the workplace and general population. Heymans et al. reported a higher pain intensity of initial LBP, no clinically relevant change in pain intensity, and disability status in the rst 3 months; higher score for kinesiophobia was most strongly related to CLBP in the workplace [23]. Similar studies in the general population also revealed that age, gender, height, health status, heavy work, chronic stress, low physical activity, smoking, and history of LBP were important predictors of CLBP [19,31]. There are some differences between these studies and our study. First, these studies have the advantage of prospectively detecting developing CLBP, but they only analyzed individuals in the workplace or the general population with a small number of participants. Secondly, these studies only analyzed the prognostic factors that caused CLBP in patients with LBP. Although our study was not as prospective as in the previous studies, we analyzed the risk factors of patients with CLBP in nationwide representative populations with a relatively large population and analyzed the probability of developing CLBP. To calculate this, multiple logistic regression was used, and a nomogram was made using coe cients. Although it is more convenient for many people to use a web-based calculator than a nomogram in actual usage, our study could not provide web-based calculation services due to the absence of a funding source, web-developer, and servers.
As our risk prediction model is comprehensive and sophisticated, it is important that our study developed this model to modify the risks of CLBP. Previous studies have simply informed patients with several risk factors that they belong to a high-risk group, which does not give practical help in the clinical setting. Our model shows the scores according to the risk factors and predicted probability of developing LBP using a nomogram, which clearly presents and explains its risk factors and probabilities to patients and physicians. These are relevant for clinicians to recommend preventive measures or treatment strategies for their patients. Thus, it may be used not only in primary care but also in healthcare centers in the general population.
To the best of our knowledge, this is the rst study investigating the predicted probability of CLBP using a nationwide Korean representative sample in the general population. The greatest strength of our study was the increased external validity of our ndings by using the KNHANES data. However, there are some limitations in our study. First, this study was conducted through a national health and nutrition examination survey, which was designed as a cross-sectional. Therefore, the actual development of CLBP could not be analyzed in this study. To assess the development of CLBP, a large prospective cohort study in the general population is needed. Unlike other prediction models (i.e., prediction for surgical site infection), constructing a prediction model for CLBP in a cohort is actually di cult. Furthermore, CLBP is closely related to age, socioeconomic status, and comorbidities. These factors are well collected in our dataset, KNHANES, which is not a cohort dataset but a cross-sectional dataset. These datasets cannot assess actual CLBP development, but they may be useful for constructing a prediction model using risk factors. Second, KNHANES was designed to minimize the sampling errors by utilizing a clustered, multistage, and random sampling method. However, selection bias may exist because of missing data.
Participants from our raw data were selected to minimize selection bias, but missing data inevitably led to bias. Unlike other studies, such as cohort studies and clinical trials, imputation in missing values is impossible in our dataset. Therefore, we excluded participants with missing data, which is necessary for analysis. Third, the simple survey of CLBP used in this study did not evaluate the severity, source, or duration of CLBP, which would be possible with instruments of measuring pain on a scale (e.g., visual analog scale pain score). Fourth, this study also could not analyze some prognostic factors of CLBP, such as previous episodes of LBP, severity of pain, and disability. However, this study analyzed many other risk factors, which was not analyzed in other studies. Fifth, the prediction model of CLBP may be dependent on ethnicity. KNHANES is a health survey of the general Korean population. Therefore, this model is representative of the general population, but more representative of the general Korean population of Korea, and caution is warranted in extrapolating these ndings to other ethnic populations.

Conclusion
We show that the risk of CLBP development can be reliably estimated in the general population. Our study developed a clinical risk prediction model to determine the probability of developing CLBP using a cross-sectional Korean population-based survey. Our prediction model showed good accuracy in the development and validation datasets. Moreover, our risk prediction model, presented by a nomogram, which is a score-based prediction system, could be incorporated in a clinical setting. Thus, our prediction model with a nomogram can help the person at risk of developing CLBP to receive appropriate counseling on risk modi cation from primary physicians.

Consent for publication
Not applicable.

Availability of data and materials
The data used in this study were sourced from the KCDC (Korea Centers for Disease Control & Prevention) in Korea. Owing to the legal restrictions imposed by the Government of Korea related to the Personal Information Protection Act, the dataset cannot be made publicly available. Interested researchers can obtain the dataset through a formal application to Division of Health and Nutrition Survey, KCDC, Korea (https://knhanes.cdc.go.kr/knhanes/eng/index.do). Researchers are not allowed to carry raw data outside the KCDC. Interested researchers can access these data in the same manner as the authors. The authors did not have any special access privileges that other researchers would not have. The authors used 2007-2009 KNHANES data.All les are available from the KNHANES webpage (https://knhanes.cdc.go.kr/knhanes/sub03/sub03_02_02.do) and the name of datasets (HN07_ALL.SAV, HN08_ALL.SAV, HN09_ALL.SAV).

Competing interests
The authors declare that they have no competing interests.

Funding
There was no external funding source used for this study.
Authors' contributions JP and J-TK are equally contributed to the conception and design, acquisition of data, analysis and interpretation of data, and drafting/revisions of article. S-MP contributed to the conception and design, analysis and interpretation of data, drafting/revisions of article, as well as nal approval of the article. HK and H-J Kim contributed to the acquisition of data and interpretation of data. OK conducted the statistical analyses. JSY and B-SC contributed to the conception and design, drafting/revisions of article, as well as nal approval of the article. All authors contributed to and approved the nal manuscript. Numeric parameters are expressed as mean and standard deviation in parentheses Categorical parameters are expressed as counts and percentages in parentheses CLBP; chronic low back pain, BMI; body mass index, PA; physical activity, COPD; chronic obstructive pulmonary disease * Body mass index was categorized into underweight (<18.5 kg/m 2 ), normal (18.5-24.9 kg/m 2 ), and obese (≥25.0 kg/m 2 ) † Household income level was calculated by dividing the total household monthly income with the obtained levels then grouped into quartiles ‡ Educational level was divided into the following four groups: ≤6 years (elementary school), 7-9 years (middle school), 10-12 years (high school), and ≥13 years (college or university). § Physical activity was defined as three categories. First, walk was defined as walking activity for 5 or more days per week at least 30 minutes. Middle physical activity was defined as mid-intensity physical activity for 5 or more days per week at least 30 minutes. High physical activity was defined as high-intensity physical activity for 3 or more days per week at least 20 minutes.
|| Depressive symptom was defined as individuals in this survey who felt sad or depressive symptom for two consecutive weeks during the past one year ¶ History of major cancer: stomach, liver, colon, breast, uterine cervical, prostate or lung cancer OR, Odds ratio; 95% CI, 95% confidence interval, PA; physical activity, COPD; chronic obstructive pulmonary disease * Educational level was divided into the following four groups: ≤6 years (elementary school), 7-9 years (middle school), 10-12 years (high school), and ≥13 years (college or university). † Levels of sitting time were divided into 4 categories using quartiles: <5, 5 -7, 8 -10, >10 hours/days † Middle physical activity was defined as mid-intensity physical activity for 5 or more days per week at least 30 minutes. ‡ Depressive symptom was defined as individuals in this survey who felt sad or depressive symptom for two consecutive weeks during the past one year § History of major cancer: stomach, liver, colon, breast, uterine cervical, prostate or lung cancer Risk prediction model was fully adjusted by all co-variables listed in the table.