Feasibility of Using Common Data Model for Orthopedic Research: Analysis of Risk Factors for Periprosthetic Joint Infection after Total Joint Arthroplasty

Background: Common data model (CDM) is a standardized data structure dened to eciently use different sources in hospitals. A study using the CDM is scarce for orthopedic outcome researches due to the complexity of variables. We aimed to test the feasibility of applying CDM in the orthopedic eld and analyzed risk factors for periprosthetic joint infection (PJI) after total joint arthroplasty (TJA) using CDM. Methods: We undertook a retrospective cohort study of all primary and revision hip and knee TJAs at our institution from January 2003 to October 2017. We identied potential risk factors for PJI after TJAs in the literatures, which included preoperative demographic/social factors, previous medical history, intraoperative factors, laboratory results and others. The data sourced from EMR was extracted, transformed, and loaded into CDM. Results: Variables such as demographic/social factors, medical history and laboratory results could be converted into CDM, but the other known risk factors could not. In total, 12,320 primary hip and knee TJAs and 120 revision arthroplasties were identied. Among them, 34 revisions were done because of PJI. Risk factors of PJI were hypertension and urinary tract infection after total hip arthroplasty, and age (70-79 years), male sex, anemia, steroid use, and urinary tract infection after total knee arthroplasty. Conclusions: This study demonstrates that orthopedic outcome researches using CDM is feasible although data converting to CDM was possible for limited factors. Further data transforming technologies need to be developed to analyze more factors relevant to orthopedic area, such as intraoperative factors and imaging ndings.

technologies need to be developed to analyze more factors relevant to orthopedic area, such as intraoperative factors and imaging ndings.

Background
Research using medical information has been actively carried out through the development of the information technology (IT). Commonly, electronic medical records (EMR) or administrative claims databases have been widely used for observational studies of clinical data. However, inconsistent data formats make large-scale clinical research collaboration between hospitals di cult and take a lot of time and effort. Thus, the need for standardization of EMR data is considered important in the medical eld.
The development of standard clinical information models is an attempt to tackle the storage and exchange of clinical data. Some researchers have shown that analyzing EMR data using standard-based methods is economical and improves e ciency [21,25]. Common Data Model (CDM) allows for the systematic analysis of disparate observational databases. The concept is to transform different data into a standardized common data format by coding schemes and terminologies.
Total joint arthroplasty (TJA) is a commonly performed orthopedic procedure that can improve quality of life in patients with advanced arthritis. Over the past two decades, the number of TJAs has increased exponentially [7]. However, periprosthetic joint infection (PJI), which is the most serious complication of TJA, can result in severely limited joint function and increased mortality. Many studies have attempted to identify risk factors for PJI, which include rheumatoid arthritis, diabetes, renal disease, depression, hypercholesterolemia, anemia, urinary tract infection, hypertension, age, male, obesity, smoking, steroid use, blood transfusion, prolonged operative time, wound problem, and malnutrition [1, 2, 4, 6, 8-10, 15-17, 20, 24, 26-28, 30, 34]. However, only a few of them have considered multiple risk factors [2,4,6,8,27,34]. Furthermore, results obtained from different studies examining the same risk factor have reported con icting results [29].
Although several studies have been performed for orthopedic outcome researches using EMR or administrative claims databases, studies using CDM is scarce due to the complexity of variables in the orthopedic eld. Therefore, we wanted to test the feasibility of applying CDM in the orthopedic research, especially for evaluation of risk factors of PJI. As such, the purposes of this study were, 1) to apply standard CDM methods and algorithms to an observational orthopedic research, and to identify problems in converting EMR parameters into CDM, and 2) to evaluate risk factors of PJI when analyzed by CDM.

Patients
We obtained approval from Institutional Review Board (IRB) for this retrospective review of medical records. We included patients who underwent primary TJAs (hip and knee) from January 2003 to October 2017 at our institution, which is a referral, training hospital located in an urban area in South Korea. We rst identi ed cases of revision after TJA by using the International Statistical Classi cation of Diseases and Related Health Problems 10th Revision (ICD-10) diagnosis codes in EMR. The reasons for revision were periprosthetic joint infection (PJI), loosening, prosthesis failure, periprosthetic fracture, osteolysis, and dislocation. Then we identi ed cases of PJI from the revision cases and focused on the risk factors of PJI (Fig. 1). The patients with prior primary or revision surgery at outside hospitals were excluded. Staged revision procedures for infection were counted only once.

Risk factors for PJI
We searched for all possible risk factors for PJI in the literatures [2,3,13,29,34] and collected the following clinical data: preoperative demographic/social factors (age and sex), previous medical history (comorbidities and drug history), intraoperative factors (operative time, oxygenation, preparation, skin closure, and blood transfusion), laboratory results (albumin, cholesterol, blood cell counts, in ammatory markers, etc.) and others (admission days, observation duration, and venous thromboembolism prophylaxis). Among these data, we identi ed items that could be converted to CDM to test the feasibility.
All clinical data abstracted were at the time of primary TJA.

Conversion of EMR parameters to CDM
The algorithm that EMR data converted to CDM is as follows; mapping EMR to standard concepts, extraction-transformation-loading (ETL) of patient data into CDM, and evaluation of the CDM-based results [11] (Fig. 2). The coding system used for diagnosis in the EMR is an ICD-10 code, whereas the standard concept of CDM for diagnosis is based on the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) [31]. Although most codes was mapped to CDM through the SNOMED-CT, the standard concepts of CDM for drug exposure was based on the RxNorm from the US National Library of Medicine for medications [19]. However, all of codes in EMR were not corresponded to mapping in the CDM version available at the time of this research. Therefore, we conducted to nd the corresponding CDM code with the mapped code, which was then re-grouped to obtain the desired value from the code in the CDM. To ensure minimal grouping errors and minimal information loss, four authors reviewed the concept mappings and achieved agreement by consensus.
To perform mapped patient data into CDM, extraction, transformation, and loading (ETL) process is required. We performed the ETL process on ve tables. Five core tables are involved in our data loading:  (Fig. 3).
To load patient data into CDM, we developed the ETL scripts as a form of Standard Query Language (SQL) according to study design, and performed them as actual ETL process (Fig. 4).

Statistical analysis
For categorical variables, chi-squared and sher's exact test were performed using PJI as a dependent variable and each parameter as an independent variable. For blood results two variables (low vs. high) were generated based on the reference ranges used by our hospital laboratory and analyzed as categorical variables. Continuous variables were analyzed by using t-tests and categorical variables by chi-square tests. We performed logistic regression analysis to identify risk factors associated with PJI.
Odd ratios (OR) and 95% con dence intervals (CI) were calculated using R package for Windows with the level of signi cance set at P < 0.05. We also calculated adjusted OR using the propensity score matching for age and sex to reduce selection bias. We used MatchIt package to conduct propensity score matching with a ratio of one to ve.

Rationale, Summary, Signi cance
We aimed to test the feasibility of applying common data model (CDM) in the orthopedic eld and analyzed risk factors for periprosthetic joint infection (PJI) after total joint arthroplasty. Variables such as demographic/social factors, medical history, laboratory results and admission days could be converted into CDM, but the others such as intraoperative factors, observation duration, and venous thromboembolism prophylaxis could not be converted to CDM. When analyzed by using CDM, we found that hypertension and urinary tract infection were risk factors of PJI after THA, and age bracket of 70 to 79 years, male, anemia, steroid use and urinary tract infection were risk factors of PJI after TKA. This study demonstrates orthopedic researches using CDM is feasible although data converting to CDM was possible for limited factors.

Conversion of EMR parameters to CDM
The CDM is designed to include all observational data derived from the EMR to support the generation of reliable evidence [11,25]. It is important to obtain what we want from the study by properly designing the algorithm with the parameters currently available in the CDM. Creating mappings the variable EMR data into the target CDM concepts is also crucial to improve patient data standardization [14,22]. Thus, in previous studies, cohort studies have been mainly focused on the pharmacoepidemiological research as treatment of diseases and epidemiological analysis of deaths from certain diseases [11,12,19,23,32,33]. In our case, we focused on parameters related to risk factors for PJI after TJA and constructed the algorithm directly using SQL, not through programs already created within the CDM, to achieve the desired results in our study. Of course, the code mapping process was not easy. Four authors reviewed the code mapping, but because of incomplete concepts matching and difference between the coding systems, a little information loss was inevitable. In addition, the data in EMR are typically expressed in non-standard terms, and the textual variable values are often in free-style using different local expressions, we could not standardize these terms and the textual values into standard concepts in this study.
The main advantages of research using CDM is that such studies can be conducted on a larger scale, against lower costs, and within shorter time frames than traditional studies [5]. Also, it protects the privacy and security of patients in research because not the information of a certain patient but the information of a certain result is used in CDM tool [25]. In our study, to maintain patient con dentiality privacy and security, the original patient identi cations were removed when the patient data were converted to the CDM. The CDM is also an important part of multi-organization collaborative research [19,22]. Because each hospital has a different structure in patient information, it is necessary to cooperate with multiple hospitals to provide information for standardization of patient information through CDM tool. However, differences in data structures and coding system are still major barriers to standardize data in CDM tool [31].

Risk factors of PJI
In this study, hypertension was identi ed as a signi cant risk factor of PJI after THA, which is concurrent with some studies [1,2,6,30]. The studies demonstrated that hypertension is associated with delayed wound healing following TJA.
Urinary tract infection (UTI) was a signi cant risk factor of PJI after both THA and TKA in this study.
Usually, UTI is more common in women than in men and the reported prevalence of UTI in women undergoing primary TJA ranges from 5.1-36% [4,6,9,26]. Therefore, symptomatic UTI should be treated before proceeding TJA.
We found an association between age and risk of revision, which is consistent with previous ndings [5,7,[17][18][19][20][21]. Although older patient age would seem to coincide with poorer nutritional status and thus elevated infection risk, some studies reported an increased risk of revision for relatively younger patients [7,17,18,22].
This study found that a male sex was a signi cant risk factor of PJI after TKA, which coincides with some studies [3,6,15,27,29]. A study suggested that men can get a greater degree of surgical trauma and tissue necrosis than in women [27]. Also, men have a more active life-style than women after TJA. Therefore, differences in exercise volume can cause overuse differences after TJA, which may result in revision surgery.
In this study, preoperative anemia was also associated with risk factors of PJI after TKA. Anemia is usually associated with a patient's poor nutritional status. Previous literatures have shown that primary TJA patients who have preoperative anemia are more likely to receive blood transfusions, which are associated with an increased risk of postoperative infection [2,4,[8][9][10]. Therefore, patients should be preoperatively evaluated for causes of anemia, such as iron de ciency, and considered for recombinant human erythropoietin treatment in order to decrease the risk of PJI [8,10].
We also found steroid use as a risk factor of PJI after TKA, which is consistent with previous literatures [2,6,18,28]. The association between steroid use and PJI is likely to be mediated at least in part by impaired wound healing due to the anti-in ammatory and immuno-suppressive effects of steroids [20]. In addition, steroid use can cause problems of calcium and vitamin D metabolism, zinc de ciency, and most importantly an accelerated bone mineral loss [16].

Limitations of study
There are several limitations to our study that must be noted. First, although the study objective was to utilize a CDM to identify risk factors of PJI after TJA, we couldn't analyze all of them that have been reported in the literature. We couldn't use non-matching EMR code in CDM. In our further study, we will continue improving the scalability of the converting variable data to CDM. Further data transforming technologies need to be developed to analyze more factors relevant to orthopedic area, such as intraoperative factors and imaging ndings. Second, the subjects were from a single institution and our methodology has not been tested with other uses. The research of CDM designed for one use might lack credibility in terms of methodology. Therefore, the generalizability still needs to be con rmed. We will conduct subsequent research to use multi-center data for large-scale analysis and further validate our methods.

Conclusions
This study presents an approach to achieve semantic standardization among different clinical data sources by using CDM in the orthopedic eld. Although data converting to CDM was possible for limited factors, we could propose reusable data transforming method. Therefore, it may differ for other uses and associated data element sets, but we consider that the methodology reported here can be applied to other researches in the orthopedic eld.

Declarations
Ethics approval and consent to participate This study was approved by the Institutional Review Board of the Seoul National University Bundang Hospital. We have obtained the written informed consent for participation in the study from all participants.

Consent for publication
Not applicable.

Availability of data and materials
All relevant data are included in this manuscript. Additional data may be requested by contacting the corresponding author.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was supported by the Seoul National University Bundang Hospital Research Fund. The funding source had no role in study design, data collection, analysis or interpretation, or in writing the manuscript. The article processing fee would be expensed using the funding received.
Authors' contributions YJC, HSG participated in the design of the study. YJC, JHS measured the data. YJC, SJJ, HSG were responsible for the statistical analysis of the study. All authors contributed to the writing of the manuscript.

Figure 1
The patient selection ow chart.

Figure 2
The algorithm that electronic medical records (EMR) data converted to common data model (CDM).

Figure 4
ETL scripts as a form of Standard Query Language (SQL).