Diagnose and Prediction of Different Machine Learning methods for Knee Osteoarthritis: a protocol for systematic review and meta-analysis

Background: Knee osteoarthritis (OA) is a chronic and progressive joint disease with a higher contributor to global disability, mainly in the elderly and particularly in women. The available diagnostic approaches such as X-ray, computed tomography and magnetic resonance imaging have large precision errors and low sensitivity. Machine learning (ML) is the application of probabilistic algorithms to train a computational model to make predictions, it has great potential to become a valuable clinical diagnostic tool. This review aims to determine the diagnosis and prediction accuracy of different machine learning methods for Knee Osteoarthritis Methods: Two reviewers systematically searched Cochrane, PubMed, EMBASE, and Web of Science (last updated in June 2020) for eligible articles. To identify potentially missed publications, the reference lists of the nal included studies were manually screened. Outcomes assessed were test characteristics such as accuracy, sensitivity, specicity, and area under the receiver operating characteristic curve (ROC). We will use the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool to assess the risk of bias and applicability. Two independent reviewers will conduct all procedures of study selection, data extraction, and methodological assessment. Any disagreements will be consulted with a third reviewer. RevMan 5.3 software and Stata V15.0 will be used to pool data and to carry out the meta-analysis if it is possible. Results: This systematic review will provide a high-quality synthesis of machine learning for diagnose of knee Osteoarthritis from various evaluation aspects including accuracy, sensitivity, specicity and AUC. Conclusion: The ndings of this systematic review will provide latest evidence of diagnosis and prediction of different machine learning for patients with knee Osteoarthritis. Ethics and dissemination: No individual patient data will be used in this study; thus, no ethics approval is needed.


Introduction
Knee osteoarthritis (OA) is the most common musculoskeletal disorder, Mainly in the elderly population and particularly in female 1 .With a global prevalence of radiographically con rmed symptomatic knee OA estimated to be 3.8% 2 , 3 and a lifetime risk of symptomatic knee OA in Western populations estimated to be over 40% 3 . The annual incidence of knee OA in the United States is estimated at 240 persons per 100,000 4 . The report shows that patients with knee OA have an almost two-fold increased risk of sick leave and about 40-50% increased risk of disability pension compared with the general population 5 . Over time, it has caused large economic burden on individual and society. At present, the aetiology and pathology of this disease are not clear, there has no effective cure for knee OA, and only control of the symptoms and treatment to prevent further development of the disease are available. 6-8 . Thus, reliable and early detection of knee OA are particular important to conquer this disease.
In the past few decades, the diagnosis of knee OA in the clinic is most often made using the 1986 criteria of the American College of Rheumatology. These criteria include a combination of the patient's age, signs and symptoms on physical exam, radiographic and/or laboratory evidence 9 . In clinical, taking X-ray images of the diseased knee is the most commonly method used for the diagnosis of OA, but it is not effective in the early period of etiologic changes since X-rays only provides an approximation of the articular cartilage, and visible joint narrowing and osteophytes typically manifest in the later stages of OA. Over the past decades, Magnetic resonance imaging (MRI) as a non-invasive imaging modality that can be used to establish knee OA diagnosis, assess disease severity, and monitor disease progression.
However, the studies have shown that have moderate sensitivity of 61% and high speci city of 82% 10 , and fail to deliver a cost effective, simple to use, and suitable solution for mass screenings and repeated use 11,12 . The diagnosis of knee OA is still a challenge in the biomedical engineering and different methods need to be explored.
Machine learning (ML) is a form of arti cial intelligence, and is the application of probabilistic algorithms to train a computational model to make predictions 13 . ML has great potential to become a valuable clinical tool as it may overcome the time-consuming and subjective shortcomings of current biomechanical methods, using in diagnosis, treatment optimization, and prediction of outcomes 14,15 . It has already been used to elucidate the underlying biological processes related to knee OA 16,17 . At present, the machine learning methods for the diagnosis of knee OA mainly include support vector machine, Convolutional Neural Network, arti cial neural network, etc. However, the diagnostic test accuracy of different machine learning methods is still controversial.
The aim of this systematic review is to gather the existing diagnostic accuracy test research on the use of different machine learning methods and assess their application for detection in screening patients with knee osteoarthritis. Strengths and limitations of the current studies as well as gaps in the research will be highlighted, to uncover elements that could impede this implementation.

study registration
This systematic review has been registered on PROSPERO (registration number: CRD: 42019133305).This research protocol has been developed according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses Protocol (PRISMA-P) 18 ,and we will conduct the systematic review and meta-analysis according to the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines 19,20 .

Search strategy
The electronic databases PubMed, Cochrane library, EMBASE, and Web of Science will be searched. The keywords will use in the search strategy included ("Osteoarthritis, Knee" or "goneitis" or "knee arthritis" or "articulatio genus" or "gonarthritis") and ("Arti cial intelligence" or "deep learning" or "Machine Intelligence" or "machine learning" or "supervised machine learning" or "neural networks" or "data mining" or "Computational Intelligence" or "AI") and ("sensitivity" or "speci city"). Furthermore, reference lists of the included studies will be screened manually for relevance to identify potential studies missed in the systematic search 21 . The search is limited to English language articles, and we did not restrict the study to source, country or publication date. We will provide speci c search strategy sample of PubMed and will be shown in supplement 1.

Selection criteria 2.3.1 Types of studies
We will include studies that investigated diagnosis and prediction accuracy of different machine learning methods for Knee Osteoarthritis, the study design will be limited to prospective studies or retrospective studies.

Types of participants
Participants suffered from knee OA (clinically diagnosed by the critical of American Rheumatism Association or the KL scale and Clinician's diagnosis) will be included regardless of the patient's race, gender, age, country, disease duration or disease severity. And the patients did not receive an intraarticular knee injection in the one month prior to the study and they had no hip or ankle arthritis.

Types of outcome measures
We will assess the following outcomes at the end of the treatment period. The study reporting at least one of the following outcomes will be included. The primary outcome includes the accuracy, sensitivity, speci city. The secondary outcomes consist of Positive predictive value (PPV), Negative predictive value (NPV), false positive rate (FPR), and false detection rate (FDR) and area under the receiver operating characteristic curve (AUC).

Exclusion criteria
We will exclude the studies if: (a) not relevant subject outcome, (b) not methods machine learning, (c) the information provided in the results was insu cient for data extraction, (d) duplicate studies, commentaries, summaries, editorials, letters, or case reports.

Study selection
Two authors (ZL and MT) independently screen a titles and abstracts based on the inclusion or exclusion criteria, record reasons for exclusion of the ineligible studies. And subsequently retrieved all relevant fulltext articles for suitability assessment. If necessary, any discrepancies regarding inclusion will resolved through discussion or by consulting a third member of (ZY) the review team until consensus is reached. We will record selection process in su cient detail to complete a PRISMA ow chart.

Data extraction
Data extraction should be informed by TRIPOD using the TRIPOD adherence guidance, PROBAST and the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies. We will extract data for qualitative analysis with qualitative analysis from the included studies. For each of the included studies, two reviewers will extract the following information that satis ed the inclusion criteria according to the pre-designed Excel form, Including :(1) basic character of the research object (authors, year of publication, country, patient ages, study design (prospective or retrospective), patient numbers, Body mass index), (2) general dataset (index test, Size of Dataset/images, Stage of clinical diagnosis, reference standard, type of Machine Learning), number of predictors (candidate and nal), internal validation type, predictive performance measures (discrimination and calibration), number of models developed and the details of the ML technique used to develop each model (eg, technique, preprocessing, data cleaning, optimization algorithm, predictors selection, penalisation techniques, hyperparameters, code, data availability and so on). Moreover, the TP, TN, FP, and FN results of MRI and PET/CT of each included studies will be included, all the included studies will be analyzed using a 2⊆ 2 contingency table. When available data is insu cient, we plan to contact the authors of included studies to obtain additional information. If we cannot achieve those data, then we will just analyze the available data and also discuss its potential impacts. Disagreements will be discussed and resolved by consensus among the third review authors.

Assessment of risk of bias in included studies
Risk of bias assessment for the systematic review will performed independently by two reviewers (ZY and MT) using PROBAST tool (Prediction model Risk of Bias Assessment Tool) 22 , which evaluates four key domains: patient selection, predictors, outcome and analysis. Each domain is assessed based on the risk of bias and concerns about the applicability to the review ( rst three domains only). Each question is answer with "yes", "no", "unclear", the level of risk of bias can be judged as "low risk" "high risk" "unclear risk" homologous 23 .

Data synthesis and analysis
Although the included studies ful ll all inclusion criteria, there are still some underlying differences in regard to Stage of clinical diagnosis, target condition and reference standard. If data provided in the paper was su cient and have su ciently homogenous for analysis, we will present a quantitative analysis, and included studies will be analyzed using a 2x2 contingency table, using these tables, numerical values for sensitivity and speci city were obtained from false negative (FN), false positive (FP), true negative (TN) and true positive (TP). If some of the studies do not directly give all the data in the 2×2 table, we will use the calculator in Review Manager 5.3 to calculate the missing data based on the existing data in the text or the appendix in each study. A descriptive forest plot and summary receiver operating characteristic (SROC) curves will be derived by Review Manager 5.3, the area under the curve (AUC) is the nal comparison indicator. The criteria for AUC classi cation are 0.90-1 (excellence), 0.80-0.90 (good), 0.70-0.80 (fair), 0.60-0.70 (poor) and 0.50-0.60 (failure).
We will combine data using a xed-effect or a random-effects model, on the basis of the clinical or methodological diversity for heterogeneity 24 . If there is no signi cant statistical heterogeneity is found from the included trials or no data can be combined, we use a xed effects model to analyze the data. If we nd substantial heterogeneity (≥ 50%), we will use a random-effects model. In addition, different diagnostic thresholds of included studies may lead to heterogeneity, we will use the spearman correlation coe cients to test whether there is a threshold effect. When there is a threshold effect, sensitivity and speci city will be negatively correlated, and the results will present a "shoulder-arm" point distribution on the SROC curve.
For outcomes where we cannot provide a quantitative analysis, we will present the results of individual studies in a narrative synthesis (qualitative analysis). We will resolve any disagreement through discussion among the review authors.

Assessment of heterogeneity
Heterogeneity among the studies will be evaluated using the I 2 statistic. I 2 > 50% indicates a signi cant heterogeneity, and under this circumstance, subgroup analysis, in uence analysis, as well as metaregression analysis will be performed to ascertain the potential causes from clinical or methodologic heterogeneity 25 . The choice of random effect model or xed effect model will depend on whether there exists heterogeneity or not. If heterogeneity exists but cannot be explained reasonably and consequently meta-analysis is unavailable to be conducted, we will describe the data.

Subgroup analysis
Owing to differentiation of situation such as age, gender, disease condition, kinds of machine learning method and other unpredictable factors, we will conduct subgroup analysis base on the data to detect the source of heterogeneity.

Quality of evidence rating
Grading of Recommendations Assessment, Development and Evaluation (GRADE) tool will be used to evaluate the overall strength of the evidence 26-28 . Its results will be summarized in tables of Summary of Findings.

Discussion
This review is the rst to apply a systematic approach to evaluate the diagnostic performance of machine learning method in patients with knee OA as compared to existing reference standards. As opposed to the previously diagnose methods (such as clinical experts, X-ray), an advantage of using machine learning method includes a substantial reduction in the manpower required for feature engineering, since machine learning-algorithms learn to extract features by themselves. Furthermore, it can also nd some features of knee lesions in advance, and these features are not well diagnosed by doctors. It may give clinicians more ways to assist patient in relieving knee joint pain and improve the quality of life. However, due to language barriers, only two languages of the trials can be included, other related studies may be missing. Also, Different methods of machine learning and quality of methodologies may increase the risk of heterogeneity.
In summary, this study will generate present evidence of machine learning method for diagnose of patients with knee OA, and will help to reduce the uncertainty about the accuracy of diagnose and prediction. The ndings of this study will encourage further suggestions for clinicians or guideline, and will draw wide attention for both patients and researchers. Availability of data and materials Not applicable.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.