Development and validation of a new nomogram for OA based on machine learning

doi:10.21203/rs.3.rs-4268728/v1

Download PDF

Research Article

Development and validation of a new nomogram for OA based on machine learning

https://doi.org/10.21203/rs.3.rs-4268728/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Introduction: Osteoarthritis (OA) is a chronic joint disease with the global number of OA patients exceeds 300 million currently, posing a significant economic burden on patients and society. Currently, there is no cure for OA, making early identification and appropriate management of individuals at risk crucial. Thus, the development of a novel OA prediction model to screen for high-risk individuals, enabling early diagnosis and intervention, holds great importance in improving patient prognosis.

Methods: Based on the National Health and Nutrition Examination Survey (NHANES) for the periods of 2011-2012, 2013-2014, and 2015-2016, the study was a retrospective cross-sectional study involving 11,366 participants. Least absolute shrinkage and selection operator (LASSO) regression, XGBoost algorithm, and random forest (RF) algorithm were used to identify significant indicators associated with OA, and a OA prediction nomogram was developed. The nomogram was evaluated by measuring the the area under receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA) curve of training and validation sets.

Results: In this study, we identified 5 predictors from 19 variables, including age, gender, hypertension, BMI and coffee intake, and developed an OA nomogram. In both the training and validation cohorts, the OA nomogram exhibited good predictive performance (with AUCs of 0.804 and 0.814, respectively), good consistency and stability in calibration curve and high net benefit in DCA.

Conclusion: This nomogram based on 5 variables predicted the risk of OA with a high degree of accuracy, suggesting that it is a convenient tool for clinicians to identify high-risk populations of OA.

Osteoarthritis

Nomogram

NHANES

Machine learning

LASSO

XGBoost

Random forest

Osteoarthritis (OA) is a chronic joint disease characterized by cartilage damage, subchondral bone destruction, and synovial inflammation[1]. Currently, there are over 300 million OA patients worldwide, and future prevalence is expected to rise due to rising obesity rates and aging, posing a significant economic burden on patients and society[2, 3]. The pathogenesis of OA is still not fully understood and is considered to involve factors such as immunity, metabolism, hormones, and genetics[4–6]. At present, in order to improve quality of life, the clinical management of OA primarily focuses on symptom relief, including pain reduction, alleviation of stiffness, and maintenance of joint functio[7]. While joint replacement surgery is a viable treatment option for advanced-stage OA patients, the high cost and associated surgical risks make it unattainable for the majority of OA patients[8]. In summary, there is currently no cure for OA. However, Chu et al. demonstrated that the early stages of OA may be reversible, and further recommended early diagnosis as a potential strategy to delay disease progression[8]. Therefore, developing a predictive model to identify high-risk populations for OA is essential for early diagnosis and intervention to improve patient outcomes.

In recent years, the utilization of machine learning techniques has opened up new avenues for the development of clinical prediction models. Machine learning algorithms can analyze multiple variables based on data and identify relevant risk factors, providing early disease diagnosis predictions, which has been widely applied in assisting clinical diagnosis for various diseases[9, 10]. With the increasing incidence of OA, there is an urgent need for a disease risk prediction model to help early diagnosis of OA. However, previous prediction models have primarily focused on genetic aspects or solely relied on imaging and serological factors[11–14]. Other disease diagnostic models have utilized Logistic regression for variable selection and model construction. Unfortunately, these models have encountered issues such as small sample sizes, lack of comprehensive model evaluation methods, and the absence of internal or external validation. As of now, there is no disease prediction model for OA based on clinical data. The nomogram is a visual disease-specific prediction model that incorporates various clinical variables. It aids in early disease detection and is easily accessible for doctors to utilize[15, 16].

In this study, we aim to create a novel clinical prediction model through machine learning methods, incorporating patient demographics, laboratory examinations, anthropometric measurements, and health survey information. This model is designed to effectively predict the risk of OA occurrence and identify high-risk populations of OA.

Data source

The data in this study was obtained from the National Health and Nutrition Examination Survey (NHANES) in the United States. This survey employs a sophisticated, multi-stage, and stratified probability sampling method to provide a comprehensive understanding of the health and nutritional status of the non-institutionalized population in the United States. The data from NHANES is nationally representative, making it an invaluable resource for conducting large-scale epidemiological studies and developing clinical prediction models. All NHANES survey protocols have received approval from the Research Ethics Review Committee of the National Center for Health Statistics, and participants have signed informed consent forms before participating in the survey. All the data from NHANES used in this study are publicly available at https://www.cdc.gov/nchs/nhanes.

Participant selection

In our study, we utilized data from three cycles of the NHANES survey, conducted in the years 2011–2012, 2013–2014, and 2015–2016. During these cycles, a total of 29,902 participants completed extensive demographic surveys, laboratory examinations, and health status questionnaires. To ensure the accuracy and reliability of our research, we conducted rigorous data screening and exclusions. Firstly, participants under 20 years of age were excluded (n = 12,854), as our study primarily focused on osteoarthritis in adults (n = 17,048). Subsequently, individuals with missing osteoarthritis-related data were excluded (n = 1,256). Furthermore,to guarantee data integrity, we excluded individuals lacking essential demographic information (n = 1,456), those with missing questionnaire and dietary data (n = 2,462), and those with missing laboratory data (n = 508). Ultimately, a total of 11,366 participants were included as the subjects of analysis in our study, as shown in Fig. 1.

Definition of osteoarthritis

The case definition in epidemiological studies often relies on self-reported osteoarthritis (OA)[17]. March et al. demonstrated an 81% consistency rate between self-reported OA and clinically well-defined OA[18], suggesting that OA can be reliably self-reported. All participants were asked if they had ever been diagnosed with arthritis: "Has a doctor or other health professional ever told you that you have arthritis?" If participants answered "yes," they were then asked, "What type of arthritis do you have?" Based on the answers to these questions, participants were categorized as having OA, other types of arthritis or no arthritis[19].

Demographics, laboratory Factors, anthropometrics, and lifestyles

In accordance with previous research, we identified factors including age, gender, race, educational level, poverty income ratio (PIR), marital status, body mass index (BMI), alcohol consumption, smoking status, recreational physical activity, self-reported health status, dietary intake factors, renal function, and the systemic immune-inflammatory index (SII) as influencing factors[20, 21].

Age (years) and PIR were used as continuous variables. Gender was classified as male or female. Race was classified as Mexican American people, Other Hispanic people, Non-Hispanic White people, Non-Hispanic Black people, and Other or multiracial people. Education was divided into five categories: Less Than 9th Grade, 9-11th Grade, High School Grad/GED, Some College or AA degree, College Graduate or above. Alcohol consumption was defined by the response to the question: “Have you had at least 12 drinks of any type of alcoholic beverage in any one year?” and was divided into two groups (yes or no). Smoking status was classified as current smoking, former smoking, and never smoking according to the response to the questions: “Have you smoked at least 100 cigarettes in your entire life?” and “Do you currently smoke cigarettes?” Marital status is classified into five categories: married, widowed, divorced, separated, never married, and cohabiting with a partner. Based on BMI, individuals are divided into categories of underweight (< 18 kg/m²), normal weight (18–25 kg/m²), overweight (25–30 kg/m²), and obesity (≥ 30 kg/m²). Leisure physical activity levels were categorized into two groups: active and inactive. Individuals reporting moderate or vigorous leisure physical activity in a typical week were classified as active. Those who report no moderate or vigorous leisure physical activities were classified as inactive. In our study, hypertension was defined as a self-reported diagnosis by a doctor, the use of anti-hypertensive medications, or blood pressure ≥ 140/90 mmHg. Diabetes mellitus (DM) status was classified as “diabetes” (self-reported diagnosis by a doctor, HbA1c level ≥ 6.5%, fasting plasma glucose [FPG] level ≥ 7.0 mmol/L random blood glucose level ≥ 11.1 mmol/L, two-hour glucose tolerance test blood glucose level ≥ 11.1 mmol/L, use of diabetes medications, or insulin). Dietary supplement information was obtained from questionnaires designed to collect detailed data on dietary supplement usage. During each NHANES cycle, participants provided detailed dietary intake information for two 24-hour periods, which was used to estimate intake of total energy, caffeine, and fiber. The first dietary recall was collected in person during the NHANES visit, while the second was collected via telephone 3 to 10 days later. The intake was estimated as the average of the two recall periods (or the available data from the first day if only one day's data was available)[22]. Data on urinary creatinine and albumin were obtained from laboratory examination within the NHANES project. Blood biomarkers include levels of vitamin D, neutrophil count (NC), lymphocyte count (LC), and platelet count (PC). As previously described, the systemic immune-inflammation index (SII) was calculated as PC × (NC / LC). Considering the right skewed distribution of SII, we performed a log2 transformation on SII[23, 24].

Statistical analysis

NHANES is a multiple and complex survey. To represent sample weighted data, it is necessary to calculate weighted data based on sample design[25]. However, in this study, we used raw unweighted data from the NHANES database to construct models for machine learning. The reason we did not use weighted data is that weighted data is usually used to estimate the incidence/prevalence rate nationwide. We don’t estimate the prevalence nationwide, we just need to know the relationship between OA and individual characteristics to train the model[11].

Data were statistically analyzed using R software (version 4.3.0). Continuous variables are represented as mean ± standard deviation (SD), and t-tests are used to compare differences between groups. Meanwhile, categorical variables are expressed in terms of frequency and percentage, and compared using chi square tests. All statistical tests are bilateral, and a P-value < 0.05 is statistically significant.

To facilitate model development, we randomly divided all 11,366 participants into two groups in a 7:3 ratio (7,958 individuals for training and 3,408 for validation). The training cohort was used for model development, while the validation cohort was served for internal validation. LASSO regression, XGBoost algorithm, and random forest (RF) algorithm were applied for 10-fold cross-validation and feature importance assessment. Subsequently, we developed a clinical risk prediction nomogram by integrating results from the three algorithms, considering the importance of feature variables.

For model evaluation, we plotted receiver operating characteristic (ROC) curves and calculated the AUC value. To evaluate the clinical utility of the model, we further conducted decision curve analysis (DCA).

Baseline characteristics of participants

This study included 11,366 participants for analysis, with an average age of 47.9 years. Among these participants, according to the diagnostic criteria mentioned above, 1,434 individuals (12.6%) were diagnosed with OA. Among these OA patients, 504 individuals (35.4%) were male, and 912 individuals (64.6%) were female. Baseline characteristics of the two groups of participants are shown in Table 1. These participants were randomly divided into training and validation groups in a 7:3 ratio, with 7,958 individuals in the training group and 3,408 in the validation group. In the training cohort, participants had an average age of 48.0 years, with 1,007 individuals (12.7%) diagnosed as OA. In the validation cohort, the observed average age of participants was 47.7 years, with 427 individuals (12.5%) diagnosed as OA. No significant differences were observed in baseline characteristics between the two cohorts (as shown in Table S1)

Table 1

Baseline Characteristics of Study Participants from NHANES 2011–2016.
Variable	Total (N = 11366)	Normal (N = 9932)	OA (N = 1434)	p-value
Age (years)	47.9 ± 17.4	45.8 ± 16.9	62.6 ± 13.2	< 0.001
Ratio of family income to poverty	2.5 ± 1.6	2.5 ± 1.6	2.7 ± 1.6	< 0.001
Diabetes, n(%)	1944 (17.1%)	1547 (15.6%)	397 (27.7%)	< 0.001
Hypertension, n(%)	4587 (40.4%)	3631 (36.6%)	956 (66.7%)	< 0.001
Calories intake (kcal/d)	2071.4 ± 848.2	2093.6 ± 863.0	1918.0 ± 719.1	< 0.001
Coffee intake (mg/d)	137.9 ± 158.0	133.5 ± 153.5	168.0 ± 183.8	< 0.001
Fiber intake (g/d)	17.3 ± 9.6	17.4 ± 9.8	16.6 ± 8.5	0.002
Urinary protein (µg/ml)	44.8 ± 293.0	43.5 ± 278.4	53.8 ± 379.2	0.32
Urine creatinine (mg/dl)	123.4 ± 81.4	125.4 ± 82.3	109.6 ± 73.8	< 0.001
Log2 (SII)	8.8 ± 0.8	8.8 ± 0.8	8.9 ± 0.8	< 0.001
Gender, n(%)				< 0.001
Male	5658 (49.8%)	5150 (51.9%)	508 (35.4%)
Female	5708 (50.2%)	4782 (48.1%)	926 (64.6%)
Race, n(%)				< 0.001
Mexican American people	1525 (13.4%)	1409 (14.2%)	116 (8.1%)
Other Hispanic people	1187 (10.4%)	1076 (10.8%)	111 (7.7%)
Non-Hispanic White people	4652 (40.9%)	3781 (38.1%)	871 (60.7%)
Non-Hispanic Black people	2385 (21%)	2153 (21.7%)	232 (16.2%)
Other/multiracial people	1617 (14.2%)	1513 (15.2%)	104 (7.3%)
Education level, n(%)				0.11
Less Than 9th Grade	873 (7.7%)	773 (7.8%)	100 (7%)
9-11th Grade	1368 (12%)	1203 (12.1%)	165 (11.5%)
High School Grad/GED	2460 (21.6%)	2165 (21.8%)	295 (20.6%)
Some College or AA degree	3582 (31.5%)	3086 (31.1%)	496 (34.6%)
College Graduate or above	3083 (27.1%)	2705 (27.2%)	378 (26.4%)
Marital status, n(%)				< 0.001
Married	5807 (51.1%)	5031 (50.7%)	776 (54.1%)
Widowed	721 (6.3%)	499 (5%)	222 (15.5%)
Divorced	1218 (10.7%)	995 (10%)	223 (15.6%)
Separated	345 (3%)	310 (3.1%)	35 (2.4%)
Never married	2307 (20.3%)	2185 (22%)	122 (8.5%)
Living with partner	968 (8.5%)	912 (9.2%)	56 (3.9%)
Smoking, n(%)				< 0.001
Never smoker	6524 (57.4%)	5840 (58.8%)	684 (47.7%)
Former smoker	2611 (23%)	2111 (21.3%)	500 (34.9%)
Current smoker	2231 (19.6%)	1981 (19.9%)	250 (17.4%)
Drink alcohol, n(%)	8278 (72.8%)	7235 (72.8%)	1043 (72.7%)	> 0.9
Physically active, n(%)	5890 (51.8%)	5302 (53.4%)	588 (41%)	< 0.001
BMI, n(%)
Underweight	193 (1.7%)	175 (1.8%)	18 (1.3%)
Normal weight	3219 (28.3%)	2945 (29.7%)	274 (19.1%)	< 0.001
Overweight	3673 (32.3%)	3264 (32.9%)	409 (28.5%)
Obesity	4281 (37.7%)	3548 (35.7%)	733 (51.1%)
Continuous data are expressed as weighted mean ± standard deviation (SD), categorical data are expressed as number percentages.
Abbreviations: SII: systemic immune inflammation index, BMI: body mass index.

Selection of main predictors of OA

To identify the key predictor variables of OA, we performed LASSO regression, XGBoost algorithm, and RF algorithm for 10-fold cross-validation and assessment of feature importance. Using LASSO regression, we selected 7 significant predictors from the 19 feature variables in the training cohort: gender, hypertension, BMI, age, drink alcohol, education level and coffee intake (Fig. 2A-C). Figure 2D presents the importance ranking of 15 feature variables by XGBoost algorithm, including age, gender, race, hypertension, coffee intake, BMI, calories intake, Log2 (SII), fiber intake, PIR, marital status, urine creatinine, diabetes, urinary protein and smoking. Figure 2E-F displays the importance ranking of 15 feature variables by RF algorithm: age, gender, race, urine creatinine, diabetes, calories intake, marital status, urinary protein, hypertension, education level, coffee intake, smoking, BMI, fiber intake and drink alcohol. We took intersection of these feature variables obtained from the three algorithms, and it came out with 5 key feature variables: age, gender, hypertension, BMI and coffee intake (Fig. 2G).

Construction of a new predictive model of OA

Through the machine learning methods mentioned above, 5 of the original 19 variables were selected, considered optimal variables. Next, in order to create a new predictive model, we utilized multivariate logistic regression. The logistic regression analysis for these 5 variables was performed (Table S2). These predictors, which are mutually independent, were combined to create a nomogram illustrating the quantified OA risk (Fig. 3).

Performance of the new nomogram of OA in AUC, and calibration curve

ROC curve and calibration curve were used to evaluate the of performance of the new OA nomogram. Figure 4A display the distribution of ROC curve and calibration curve of the new OA nomogram for the training cohort; the ROC curve shows an AUC of 0.804, and the calibration curve exhibits good consistency. In the validation cohort, the ROC curve of the new OA nomogram has an AUC of 0.814(Fig. 4C), and the calibration curve in Fig. 4D also demonstrates good consistency and stability. The results above indicate that the nomogram exhibits excellent predictive value and discriminative capability.

Evaluation of clinical utility of the new nomogram of OA

We further conducted decision curve analysis (DCA) to evaluate the clinical utility of the new nomogram of OA. As shown in Figs. 5A-B, the nomogram exhibits a high net benefit in both the training and validation cohorts. The results above indicate that this new OA nomogram possesses a certain degree of clinical utility.

Osteoarthritis is a severe public health issue, and delayed diagnosis hinders early protective treatment of joint health[8]. Therefore, there is an urgent need for a simple and user-friendly prediction model of OA to identify high risk populations. The nomogram, a visual prediction model, which is helpful for disease screening and early diagnosis[15]. In this study, we successfully applied machine learning methods to select feature variables using data from the NHANES database. By intersecting these feature variables, we identified 5 optimal predictive factors, to construct a effective risk prediction nomogram of OA. This model exhibits a high predictive performance of OA risk (AUC of training set: 0.804, AUC of validation set: 0.814).

Currently, there is a growing annual increase in the use of machine learning for developing clinical prediction models[11, 26, 27]. To the best of our knowledge, we are the first to utilize machine learning methods based on the NHANES database to construct a risk prediction nomogram for OA using clinical data. This model has a large population base and diversity, ensuring more stable results. We selected simple clinical and laboratory datas that are easily obtainable from various factors, including demographic data, anthropometric measurements, laboratory examintation, and questionnaires, making it more conducive to the widespread application of the model. In conclusion, our prediction model has two distinct advantages: it possesses excellent predictive capacity (high AUC) and it is user-friendly, with wide adaptability (only 5 easily obtainable feature variables and cross-ethnic applicability).

In our model, we ultimately included five factors: age, gender, hypertension, BMI, coffee intake. It is acknowledged that advanced age and being female are the risk factors of OA[28–30]. High BMI is also a risk factor, as being overweight or obese has a direct effect on OA[31–35], particularly in the knee, demonstrating a dose-response relationship between BMI and OA[34, 36]. The activation of metabolic factors leading to the joint injury and the mechanical overload of the weight-bearing joint are considered possible mechanisms to explain how BMI increases the risk of OA[37]. A meta-analysis revealed an association between hypertension and OA, including 2 cohort studies and 6 cross-sectional studies[38]. Hart et al. found that there was a relationship between metabolic factors (such as hypertension and hypercholesterolemia) and OA[39]. Metabolic syndrome (MetS) and OA share similar mechanisms of inflammation, and obesity altered adipokines secretion, leading to the chronic low grade inflammatory status in joint tissues[40]. As for coffee intake, studies have demonstrated that coffee intake is associated with an increased risk of OA[41–43]. Zhang et al. utilized two-sample and two-step Mendelian randomization (MR) analyses by genome-wide association studies (GWAS) summary statistics to estimate the relationship between coffee intake and OA, and they found that coffee consumption increased the risk of OA, especially knee osteoarthritis (KOA), with BMI mediating this relationship[43]. Tan et al. observed that maternal mice exposed to low doses of caffeine exhibited impaired fetal joint integrity[44]. Caffeine antagonizes adenosine receptors and increases osteoclastogenesis, which may explain this phenomenon[45, 46].

Using this nomogram, clinical staffs can rapidly and accurately identify individuals who may be at risk of OA. For those identified as having a higher risk during the screening, it is recommended to conduct further examinations for early diagnosis and intervention to improve prognosis. Furthermore, closely monitoring BMI and blood pressure, implementing measures to control diet such coffee intake, can effectively reduce the risk of OA.

It should be noted that although internal validation was performed in the validation cohort, this study still requires external validation to further prove its generalizability.

Compared to other existing models, this study has developed an effective clinical nomogram for identifying OA in a large population. Based on the risk assessment, clinicians can create individualized diagnostic and treatment plans for the subjects. For individuals at high risk of OA, further examinations are recommended for early diagnosis and lifestyle or medical intervention to prevent the disease from progressing further. Thus, the risk prediction nomogram of OA developed in this study has significant clinical value.

OA	Osteoarthritis
NHANES	National Health and Nutrition Examination Survey
PIR	Poverty income ratio
BMI	Body mass index
SII	Systemic immune-inflammatory index
SD	Standard deviatio
DM	Diabetes mellitus
FPG	Fasting plasma glucose
NC	Neutrophil count
LC	Lymphocyte count
PC	Platelet count
RF	Random forest
ROC	Receiver operating characteristic
AUC	Area under receiver operating characteristic curve
DCA	Decision curve analysis
SD	Standard deviation
LASSO	Least absolute shrinkage and selection operator
MR	Mendelian Randomization
GWAS	Genome-wide association studies
KOA	Knee osteoarthritis
MetS	Metabolic syndrome

Acknowledgments

The authors thank NCHS for its research design and data sharing, as well as all the investigators and participants.

Authors’ contributions

Conceptualization: QZ, JC and YL; methodology: QZ; formal analysis: QZ and JC; writing—original draft: QZ, JC and YL; writing—review and editing: JC, QZ and ML; supervision: LL. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the Basic and Applied Basic Research Foundation of Guangdong Province (No. 2021A1515010137), and the Special Fund Project for Science and Technology Innovation Strategy of Guangdong Province (No. 210715106900976).

Availability of data and materials

Dataset is publicly available at https://www.cdc.gov/nchs/nhanes (accessed on 12 December 2023).

Declarations

Ethics approval and consent to participate

NHANES is conducted by the Centers for Disease Control and Prevention(CDC) and the National Center for Health Statistics (NCHS). The NCHS ResearchEthics Review Committee reviewed and approved the NHANES study proto-col.All participants siqned written informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Yue L, Berman J. What Is Osteoarthritis? JAMA. 2022;327(13):1300.
Disease GBD, Injury I, Prevalence C. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1789–858.
Sowers MR, Karvonen-Gutierrez CA. The evolving role of obesity in knee osteoarthritis. Curr Opin Rheumatol. 2010;22(5):533–7.
Liao Z, Han X, Wang Y, Shi J, Zhang Y, Zhao H, Zhang L, Jiang M, Liu M. Differential Metabolites in Osteoarthritis: A Systematic Review and Meta-Analysis. Nutrients 2023, 15(19).
Mobasheri A, Rayman MP, Gualillo O, Sellam J, van der Kraan P, Fearon U. The role of metabolism in the pathogenesis of osteoarthritis. Nat Rev Rheumatol. 2017;13(5):302–11.
Young DA, Barter MJ, Soul J. Osteoarthritis year in review: genetics, genomics, epigenetics. Osteoarthritis Cartilage. 2022;30(2):216–25.
Felson DT. Clinical practice. Osteoarthritis of the knee. N Engl J Med. 2006;354(8):841–8.
Chu CR, Williams AA, Coyle CH, Bowers ME. Early diagnosis to enable early treatment of pre-osteoarthritis. Arthritis Res Ther. 2012;14(3):212.
Abdel Hady DA, Abd El-Hafeez T. Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction. Sci Rep. 2023;13(1):17940.
Ferreira-Santos D, Amorim P, Silva Martins T, Monteiro-Soares M, Pereira Rodrigues P. Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review. J Med Internet Res. 2022;24(9):e39452.
Tsai SF, Yang CT, Liu WJ, Lee CL. Development and validation of an insulin resistance model for a population without diabetes mellitus and its clinical implication: a prospective cohort study. EClinicalMedicine. 2023;58:101934.
Li W, Feng J, Zhu D, Xiao Z, Liu J, Fang Y, Yao L, Qian B, Li S. Nomogram model based on radiomics signatures and age to assist in the diagnosis of knee osteoarthritis. Exp Gerontol. 2023;171:112031.
Li S, Ma L, Cui R. Identification of Novel Diagnostic Biomarkers and Classification Patterns for Osteoarthritis by Analyzing a Specific Set of Genes Related to Inflammation. Inflammation; 2023.
Chen X, Xu J, Zhang H, Yu L. A nomogram for predicting osteoarthritis based on serum biomarkers of bone turnover in middle age: A cross-sectional study of PTH and beta-CTx. Med (Baltim). 2023;102(20):e33833.
Bonnett LJ, Snell KIE, Collins GS, Riley RD. Guide to presenting clinical prediction models for use in clinical settings. BMJ. 2019;365:l737.
Wang Y, Li J, Xia Y, Gong R, Wang K, Yan Z, Wan X, Liu G, Wu D, Shi L, et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J Clin Oncol. 2013;31(9):1188–95.
Xu Y, Wu Q. Trends and disparities in osteoarthritis prevalence among US adults, 2005–2018. Sci Rep. 2021;11(1):21845.
March LM, Schwarz JM, Carfrae BH, Bagge E. Clinical validation of self-reported osteoarthritis. Osteoarthritis Cartilage. 1998;6(2):87–93.
Mendy A, Park J, Vieira ER. Osteoarthritis and risk of mortality in the USA: a population-based cohort study. Int J Epidemiol. 2018;47(6):1821–9.
Wang X, Xie L, Yang S. Association between weight-adjusted-waist index and the prevalence of rheumatoid arthritis and osteoarthritis: a population-based study. BMC Musculoskelet Disord. 2023;24(1):595.
Alhassan E, Nguyen K, Hochberg MC, Mitchell BD. Causal Factors for Osteoarthritis: A Scoping Review of Mendelian Randomization Studies. Arthritis Care Res (Hoboken) 2023.
Christensen K, Gleason CE, Mares JA. Dietary carotenoids and cognitive function among US adults, NHANES 2011–2014. Nutr Neurosci. 2020;23(7):554–62.
Liu B, Wang J, Li YY, Li KP, Zhang Q. The association between systemic immune-inflammation index and rheumatoid arthritis: evidence from NHANES 1999–2018. Arthritis Res Ther. 2023;25(1):34.
Qin Z, Li H, Wang L, Geng J, Yang Q, Su B, Liao R. Systemic Immune-Inflammation Index Is Associated With Increased Urinary Albumin Excretion: A Population-Based Study. Front Immunol. 2022;13:863640.
Johnson CL, Dohrmann SM, Burt VL, Mohadjer LK. National health and nutrition examination survey: sample design, 2011–2014. Vital Health Stat 2 2014(162):1–33.
Merianos AL, Mahabee-Gittens EM, Stone TM, Jandarov RA, Wang L, Bhandari D, Blount BC, Matt GE. Distinguishing Exposure to Secondhand and Thirdhand Tobacco Smoke among U.S. Children Using Machine Learning: NHANES 2013–2016. Environ Sci Technol. 2023;57(5):2042–53.
Li W, Huang G, Tang N, Lu P, Jiang L, Lv J, Qin Y, Lin Y, Xu F, Lei D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. Chemosphere. 2023;337:139435.
Johnson VL, Hunter DJ. The epidemiology of osteoarthritis. Best Pract Res Clin Rheumatol. 2014;28(1):5–15.
Prieto-Alhambra D, Judge A, Javaid MK, Cooper C, Diez-Perez A, Arden NK. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann Rheum Dis. 2014;73(9):1659–64.
Wang L, Lu H, Chen H, Jin S, Wang M, Shang S. Development of a model for predicting the 4-year risk of symptomatic knee osteoarthritis in China: a longitudinal cohort study. Arthritis Res Ther. 2021;23(1):65.
Reijman M, Pols HA, Bergink AP, Hazes JM, Belo JN, Lievense AM, Bierma-Zeinstra SM. Body mass index associated with onset and progression of osteoarthritis of the knee but not of the hip: the Rotterdam Study. Ann Rheum Dis. 2007;66(2):158–62.
Jiang L, Tian W, Wang Y, Rong J, Bao C, Liu Y, Zhao Y, Wang C. Body mass index and susceptibility to knee osteoarthritis: a systematic review and meta-analysis. Joint Bone Spine. 2012;79(3):291–7.
Grotle M, Hagen KB, Natvig B, Dahl FA, Kvien TK. Obesity and osteoarthritis in knee, hip and/or hand: an epidemiological study in the general population with 10 years follow-up. BMC Musculoskelet Disord. 2008;9:132.
Reyes C, Leyland KM, Peat G, Cooper C, Arden NK, Prieto-Alhambra D. Association Between Overweight and Obesity and Risk of Clinically Diagnosed Knee, Hip, and Hand Osteoarthritis: A Population-Based Cohort Study. Arthritis Rheumatol. 2016;68(8):1869–75.
Ho J, Mak CCH, Sharma V, To K, Khan W. Mendelian Randomization Studies of Lifestyle-Related Risk Factors for Osteoarthritis: A PRISMA Review and Meta-Analysis. Int J Mol Sci 2022, 23(19).
Raud B, Gay C, Guiguet-Auclair C, Bonnin A, Gerbaud L, Pereira B, Duclos M, Boirie Y, Coudeyre E. Level of obesity is directly associated with the clinical and functional consequences of knee osteoarthritis. Sci Rep. 2020;10(1):3601.
King LK, March L, Anandacoomarasamy A. Obesity & osteoarthritis. Indian J Med Res. 2013;138(2):185–93.
Zhang YM, Wang J, Liu XG. Association between hypertension and risk of knee osteoarthritis: A meta-analysis of observational studies. Med (Baltim). 2017;96(32):e7584.
Hart DJ, Doyle DV, Spector TD. Association between metabolic factors and knee osteoarthritis in women: the Chingford Study. J Rheumatol. 1995;22(6):1118–23.
Batushansky A, Zhu S, Komaravolu RK, South S, Mehta-D'souza P, Griffin TM. Fundamentals of OA. An initiative of Osteoarthritis and Cartilage. Obesity and metabolic factors in OA. Osteoarthritis Cartilage. 2022;30(4):501–15.
Lee YH. Investigating the possible causal association of coffee consumption with osteoarthritis risk using a Mendelian randomization analysis. Clin Rheumatol. 2018;37(11):3133–9.
Zhang Y, Fan J, Chen L, Xiong Y, Wu T, Shen S, Wang X, Meng X, Lu Y, Lei X. Causal Association of Coffee Consumption and Total, Knee, Hip and Self-Reported Osteoarthritis: A Mendelian Randomization Study. Front Endocrinol (Lausanne). 2021;12:768529.
Zhang W, Lei X, Tu Y, Ma T, Wen T, Yang T, Xue L, Ji J, Xue H. Coffee and the risk of osteoarthritis: a two-sample, two-step multivariable Mendelian randomization study. Front Genet. 2024;15:1340044.
Tan Y, Lu K, Li J, Ni Q, Zhao Z, Magdalou J, Chen L, Wang H. Prenatal caffeine exprosure increases adult female offspring rat's susceptibility to osteoarthritis via low-functional programming of cartilage IGF-1 with histone acetylation. Toxicol Lett. 2018;295:229–36.
Yi J, Yan B, Li M, Wang Y, Zheng W, Li Y, Zhao Z. Caffeine may enhance orthodontic tooth movement through increasing osteoclastogenesis induced by periodontal ligament cells under compression. Arch Oral Biol. 2016;64:51–60.
Nieber K. The Impact of Coffee on Health. Planta Med. 2017;83(16):1256–63.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Development and validation of a new nomogram for OA based on machine learning

Status:

Version 1

Abstract

Figures

Introduction

Methods

Participant selection

Definition of osteoarthritis

Demographics, laboratory Factors, anthropometrics, and lifestyles

Statistical analysis

Results

Baseline characteristics of participants

Selection of main predictors of OA

Construction of a new predictive model of OA

Performance of the new nomogram of OA in AUC, and calibration curve

Evaluation of clinical utility of the new nomogram of OA

Discussion

Conclusions

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1