The report for this paper follows the TRIPOD Statement[8].
Source of Data
The data came from a retrospective cohort study designed specifically for the development of this model. The study was conducted at Fenyang Hospital, a general university hospital in Shanxi Province, China. The participants of the model were the patients who had been admitted consecutively to the orthopedics department of the hospital for hip fractures from January 2015 to December 2020. The follow-up started on April 13, 2021 and ended on May 28, 2021.
Participants
Our hospital is a municipal tertiary hospital and its patients mainly come from surrounding counties and cities. Inclusion criteria were: ⒈Age ≥50 years old. The author chose 50 as the cut-off point in order to include as many fragile hip fracture patients as possible, especially for some women whose Fragile fractures occur earlier than others’. ⒉Fragile fracture. Fragile fractures or low-energy fractures (considered as synonyms here) refer to that hip fractures happen when patients falling from standing height or lower. For example, falling when walking or when standing up or sitting down. Exclude high-energy fractures such as high fall, traffic accident, heavy pound injury, fights and so on. Hip fractures here include femoral neck, intertrochanteric and subtrochanteric fractures. Periprosthetic and pathological fractures were also excluded.
In our hospital, the basic process after a patient admitted is: She/he is considered for surgical treatment initially and complete routine preoperative tests such as necessary laboratory tests, fracture sites and chest X-rays. If the patient has a medical disease, invite the relevant departments for consultation and give symptomatic treatment. With the consent of the patient, surgery is performed as soon as possible after stabilizing the patient's general condition. Such a scheme results in very few patients performing surgery within 24 hours of hospitalization. Conservative treatment shall be adopted if the doctor believes that the patient cannot tolerate surgery, or if the patient himself/herself refuses the procedure. The general choice of surgical treatment methods is: femoral neck fracture-3 hollow nail fixation (<65 years old) or half hip or total hip replacement (≥65 years old); intertrochanteric or subtrochanteric fracture-proximal femur intramedullary nail fixation.
Outcome:
Outcome of interest was any cause deaths within 1 year after hip fracture. It was determined through telephone interviews. All interviews were conducted after patient data had been collected.
To ensure a high successful follow-up rate, we have made the following efforts. ⒈Developed the principle of telephone interviews: minimized the content of interviews as much as possible. Interview content varied according to the different conditions of patients. In addition to collecting death and time information, it also increased the cause of death, the patient's self-care ability before the fracture and/or after recovery. The purpose was to make the interview easier to be accepted by the family. According to the treatment methods in our hospital, patients were divided into surgical and conservative two categories. For the former, if the patient died, asked about the time of death, the reason, and the self-care situation before the fracture; if survive, asked about the current self-care situation; if the patient could take care of himself/herself now, end the interview; otherwise continued to ask about the self-care situation before the fracture. For the latter, they could not simply be classified as conservative treatment, for they might go to other places for surgery after leaving our hospital. So, we must first determine whether they had been keeping conservative treatment; if so, asked like the patients taking surgery in our hospital; If not, they were classified as surgical patients and also asked like the patients taking surgery in our hospital besides one more question- operation time. In this way, at least one question would be asked in each interview. For example, those who had recovered their self-care ability after surgery can end the interview after just asking about their current recovery status. At most 5 questions be asked.
For example, those died after going to other hospitals for surgery, ⑴whether they were operated after discharge, ⑵the operation time, ⑶the time of death, ⑷the cause and ⑸the ability of self-care before the fracture(Figure 1). ⒉ Set up time reference points to help patients' families recall when the death had occurred. For example, asking the family members of the patients who go on surgery after discharge, " Which day after discharge or fracture the surgery was performed ". For the patients who had died but their family members could not recall the specific time of death, using a method of narrowing the time range gradually. For example, "Which season did the death occur? Before or after the Spring Festival? Had it passed the Lantern Festival?" ⒊ In the cases of a wrong telephone call, or the family could not determine the specific time of death, or unwilling to cooperate, we would return to the original medical record, checking the telephone number, or looking for the other telephone numbers recorded, then re-interviewed. ⒋ When the original medical records were checked and the interviews were still unsuccessful, we turned to the community medical system, which registered the telephone numbers of all families living in the community. If got a new phone number and then interviewed again.
Through the above methods, most of the first time lost to follow-up could successfully get in touch. The rest would be regarded as missing data.
Predictors:
Based on the principle of modeling (practicality is preferred to accuracy) and the purpose (to provide a reference for formulating a treatment plan), we determined the principle of selecting candidate predictors: ⒈Simplicity. Select predictors from the patient's medical history and routine preoperative preparations. ⒉Stability. Choose predictors measured relatively sF across different testers or patients. ⒊Independence. Select predictors that can be determined independently by an orthopedic surgeon, without consultation with other departments. ⒋Rapidity. The results of the predictors are determined quickly within 24-48 hours after admission, without long waits. ⒌Quantitative indicators prioritize qualitative indicators. ⒍Relevance. Select relevant factors that affect the implementation of surgical treatment. ⒎Subject matter knowledge. Select the indicators that have been shown to be relevant to survival rates.
Initially, 24 indicators were extracted for each patient to provide readers with detailed sample characteristics. Including: General characteristics: medical insurance, age(Age), sex(SEX); Disease characteristics: fracture site, fracture type, time from fracture to hospitalization, time from hospitalization to surgery, length of stay (LOS); Medical history: diabetes, hypertension(HYP), malignancy(MAL), kidney disease(KD), lung disease(LD) on admission, the ability of living independence(ALI) before fracture, cardiovascular and cerebrovascular diseases(CCD);Test results(all use the first test value after admission): partial pressure of oxygen (PaO2), fasting blood sugar (BS), serum creatinine (SC), hemoglobin (Hb), total protein (TP), albumin (ALB); mean arterial pressure (MAP); Treatment: skeletal traction. In order to understand the impact of surgery (SUR) on survival, SUR was deliberately used as a predictor though it not a preoperative indicator.
Some of the indicators were defined as follows: ⒈Medical insurance: It includs employee medical insurance and non-employee medical insurance. China has achieved a full medical insurance coverage since 2011. Non-employee medical insurance mainly includes new rural cooperative medical insurance, a small number of other types of commercial insurance, and self-paid medical treatment. On the whole, type of medicare can reflect the economic situation of a patient from the side. ⒉Fracture site: Including the femoral neck, intertrochanteric and subtrochanteric. ⒊ Fracture type: It contains primary and secondary fracture two types. If there had been fragility fractures before this fracture, such as the most common osteoporotic vertebral compression fractures, hip fractures, wrist fractures, and proximal humeral fractures, etc., they were secondary fractures; if not, were primary fractures. ⒋ Medical history: Whether a patient had been diagnosed with diabetes, HYP, MAL or KD was obtained from the medical records, that were provided by the patient and/or his/her family when he/she was admitted to the hospital. ⒌ LD on admission: It contains positive and negative. Positive are various types of pneumonia, tuberculosis, pleural effusion and/or structural changes such as pulmonary bullae and fibrosis, but do not include local stability small lesions, such as stable calcifications. They were determined by chest X-ray or chest CT. Otherwise, negative. In particular, this indicator did not include lung tumor which was classified as MAL. ⒍ ALI: Positive mainly means that the patient could not walk independently and complete the basic activities of daily life without the help of others before fracture. This results from various reasons, including sequelae of cerebrovascular events, severe hip or knee arthritis, Alzheimer's disease or severe depression, etc. Otherwise, negative. ⒎ CCD: Positive for this indicator means that the patient had been diagnosed with diseases such as myocardial infarction, cerebral infarction, and cerebral hemorrhage; or the CT and/or MRI examination of the head after admission showed the presence of old local infarct changes. In addition, it also includes the intravascular thrombosis shown by the vascular ultrasound examination of the extremities, but do not include atherosclerotic plaque formation, stenosis and other changes. These examinations were not routine examinations after patients admitted. If a patient did not take these tests, and have no history of diagnosis of the above-mentioned cardiovascular and cerebrovascular diseases, she/he was regarded as negative. ⒏ PaO2, BS, SC, Hb, TP, ALB: These 6 indicators were routine laboratory test items after admission. Generally, specimens were collected at 7 o'clock in the morning on the second working day after admission, and all the results would be reported in the afternoon. ⒐ MAP: It is obtained by 1/3 systolic blood pressure plus 2/3 diastolic blood pressure. This value came from the vital signs’ examination of a patient performed immediately after admission. ⒑ Skeletal traction: In general, skeletal traction was considered on the day of admission. If patients refused, or if it was determined that surgical treatment would be forthcoming the next day, gave up bone traction. Almost all patients given this treatment were tibial tuberosity traction.
Based on the above predictor screening principles, we reduced the number of candidate predictors. We excluded medical insurance, fracture site, fracture type, time from fracture to hospitalization, time from hospitalization to operation, and skeletal traction. Because no matter what the results of these indicators are, they do not affect the choice of surgical treatment. According to the principle of priority to quantitative indicators, exclude diabetes, kidney disease, selected BS, and SC. Although MAP is a quantitative variable, it is easily affected by various factors and poor stability, so we chose HYP. There is some overlap between serum TP and ALB. Studies[9, 10] have shown that low ALB was a prognostic factor for death after fracture, so TP was excluded. Based on the principle of stability, PaO2 was excluded. Because many elderly patients were often given oxygen immediately after admission, and it was difficult to strictly follow the doctor’s instructions to stop oxygen inhalation two hours before measuring the index, making it very unstable across patients. As a result, 12 candidate predictors were determined, including Age, SEX, MLA, LD, ALI, CCD, SUR, BS, SC, Hb, ALB and HYP.
Sample Size
We did not specifically calculate the sample size required. The reason is that sample size is affected by many factors, and there is no well-known calculation method[11-13]. Secondly, in reality, we cannot easily expand a single-center study into a multi-center trial or arbitrarily extend research time to increase sample size.
According to widely accepted empirical guidelines, we strived to achieve at least 10 events per variable (EPV), or at least a total of 100 events in the sample. Fortunately, many empirical studies can guide how to develop predictive models based on a small sample[14, 15].
Missing Data
The characteristics of missing data are shown in Figure 2.
Among the 14 variables (2 outcome variables, death(Y)and time, and 12 covariates), there were 8 variables with missing values, including SUR (containing 9 missing values), BS (9), SC (9), Hb (9), ALB (13), LD (58), Y (11) and time (11). 660 observations had no missing, a maximum of 5 variable values were missed in 5 observations.
Whether the missing data came from covariates or outcome, we used the mice package for Single Imputation (SI) in R software. Although Multiple Imputation (MI) could be used, the reasons why we chose SI are as follows: First, we had relatively few missing. Among all the variables with missing values, LD had the highest missing rate, which was 7.9%, and the missing rates of the remaining variables were less than 2%. Second, a SI data set can be easily created from the first of a series of MI data sets, and it avoids the complicated combining over multiple MI data sets[16]. Third, no method has been found to combine LASSO models derived from multiple MI data sets[17]. Finally, empirical research shows that the estimation of model regression coefficients is very consistent between the SI data set and the MI stacked data set[18].
Because our SI data set came from the first of the MI data series, we also analyzed the missing mechanism of the data and explained the method of generating the MI data series. The missing data was mainly related to the choice of treatment methods for patients. When patients and their families were not active or even skeptical about surgical treatment, they often refused any routine preoperative preparations. In this case, several variable values were often missing together. Therefore, missing data was often seen in patients who took conservative treatment. This is a "missing at random" (MAR) situation, and MI can effectively deal with this problem[19-21].The 12 candidate predictors and the outcome of event and time were all included in the imputation procedure. In this way, 8 variables, LD, SUR, BS, SC, Hb, ALB, Y and time, were imputed. Among them, the two factor variables, LD and SUR, were imputed using the logistic regression (logreg) method, while the remaining 6 variables were imputed using the predictive mean matching (pmm) method. No interaction terms were introduced during the MI procedure.
Statistical Analysis Methods
The distributions of the 5 continuous variables Age, BS, SC, Hb and ALB of the 12 candidate predictors were checked for extreme values. After excluding input errors, only SC was found to have obvious extreme value. The extreme values of SC were winsorized
to avoid excessive leverage effects. It was done by shifting the values above 99th centile(8 values, of them the maximum was 860μmol/L) to truncation points(99th centile ,190.75μmol/L).
To make full use of the information, we did not categorize any continuous variables and all kept their original scales. For all continuous variables, linear and non-linear relationship with outcome were fitted. The nonlinear were fitted by using Restricted Cubic Splines (RCS), and 3, 4, and 5 knots were compared for each variable. In particular, we also checked the log transformation for SC. In addition, we plot the relationship between the fitted variables and the outcome to check the biological rationality. Based on a higher Waldχ2 value but a lower degree of freedom(df) and
biological rationality, the final coding of each number variable in the model was determined (Table 1). The optimal coding for each predictor were: Age: linear; BS: non-linear (RCS,3 knots); SC: linear; Hb: non-linear (RCS,3 knots); ALB: linear.
According to the principle of simplicity of modeling, all the seven categorical variables of SEX, MLA, LD, ALI, CCD, HYP and
Table1
Optimal coding exploration for continuous predictors (complete case analysis)
|
Predictor
|
coding
|
Waldχ2
|
df
|
p-value
|
Age
|
Linear
|
24.96
|
1
|
<.0001
|
RCS (3)
|
22.03
|
2
|
<.0001
|
RCS (4)
|
23.90
|
3
|
<.0001
|
RCS (5)
|
23.62
|
4
|
0.0001
|
BS
|
Linear
|
0.61
|
1
|
0.4345
|
RCS (3)
|
0.77
|
2
|
0.6798
|
RCS (4)
|
5.66
|
3
|
0.1291
|
RCS (5)
|
6.39
|
4
|
0.1719
|
SC
|
Linear
|
13.27
|
1
|
0.0003
|
RCS (3)
|
12.77
|
2
|
0.0017
|
RCS (4)
|
13.94
|
3
|
0.0030
|
RCS (5)
|
13.96
|
4
|
0.0074
|
Log
|
11.25
|
1
|
0.0008
|
Hb
|
Linear
|
10.19
|
1
|
0.0014
|
RCS (3)
|
15.14
|
2
|
0.0005
|
RCS (4)
|
14.65
|
3
|
0.0021
|
RCS (5)
|
15.36
|
4
|
0.0040
|
ALB
|
Linear
|
33.97
|
1
|
<.0001
|
RCS (3)
|
31.76
|
2
|
<.0001
|
RCS (4)
|
31.30
|
3
|
<.0001
|
RCS (5)
|
32.07
|
4
|
<.0001
|
Note: Bold represents the optimal coding.
SUR, were coded as binary variables.
|
1. Type of model
Because our goal was a 1-year survival rate, a time-to-event outcome, we chose the Cox proportional hazards model.
2. Predictor selection during modeling
Based on our small sample and empirical studies[14, 15, 22, 23] shown that the stepwise selection deteriorates the predictive quality of the model in small data sets, we chose to build a full model that included all 12 candidate predictors. We would further refine the full model using the LASSO method where selection is achieved through shrinking regression coefficients to zero[24, 25]. So, finally we got a LASSO model.
3. Interaction terms
In order to avoid the statistically Ⅰerror of multiple repeated detection of interactions[26], we tested the overall interactions of Age with other remaining predictors[27]; the same test was also do for SEX. If there exists statistical significance as a whole, then introduce the statistically significant interaction terms; otherwise, the possibility is excluded. We also tested the proportional hazards assumption.
4. Model performance
Overall performance measures Nagelkerke’s R2 was presented. The concordance (c) statistic was given as the LASSO model’s discrimination measure, which was further illustrated by dividing the predictions in 3 groups, and plotting the Kaplan-Meier curves of each group.
We did not evaluate calibration, because “assessment of calibration makes little sense in the development data, while it is essential at external validation”[28].
5. Internal Validation
Then we assessed internal validity with a bootstrapping procedure of 1 000 repetitions for a realistic estimate of the performance of the LASSO and full model in similar future patients.