We identified 14,351 patients who underwent surgery from April 1, 2010 to March 31, 2015 and were enrolled into NSQIP at our hospital. An SSI was identified in 795 (5.5%) of these patients. Of these, 540 (68%) had superficial SSIs and 255 (32%) had deep or organ space SSIs. Descriptive statistics for patients in the study sample are reported in Additional file 4. The derivation and validation datasets were similar in terms of baseline covariates (Additional file 5).
Predictive modeling for hospitalization diagnostic codes (ICD-10)
We identified 3,085 hospitalization diagnostic (ICD-10) codes recorded within 30 days following the surgery date. These codes then were clustered into 994 three-digit hospitalization diagnostic codes that were used for the further analyses.
Stage 1: Given a large number of diagnostic codes (possible predictors), the Random forests approach was used to identify a subset of top important 30 hospitalization diagnostic codes that best predicts classification. We used 800 classification trees and 46 variables available for splitting at each tree node. The accuracy of the Random Forests model was 95.3%. The resulting SSI prediction model demonstrated positive predictive value (PPV) of 98%, negative predictive value (NPV) of 97%, and AUC (area under the receiver operating characteristic curve) of 0.78 (95% CI 0.77–0.79). The accuracy of the Random Forests model after a 10-fold cross-validation was 94.3%. Figure 1 presents the top 30 hospitalization diagnostic (ICD-10) codes for classification of SSIs that have been identified using the permutation VIM.
T81 – Operative complication (infection, hemorrhage, etc.); C54 – Malignant neoplasm of specified part of uterus; K65 – Peritonitis; B96 - Other bacterial agents as the cause of diseases classified elsewhere; K83 – Biliary duct infection, obstruction, perforation, or fistulation; Y83 - Surgical operation/procedures as the cause of abnormal reaction of the patient/or later complication; C51 - Malignant neoplasms of female genital organs; Y83 - Surgical operation/procedures as the cause of abnormal reaction of the patient/complication; C51 - Malignant neoplasms of female genital organs; K75 - Abscess of liver; L27 - Dermatitis and eczema; B95 - Streptococcus and staphylococcus as the cause of diseases; K42 - Umbilical hernia; A04 - Other bacterial intestinal infections; M71 – Bursal abscess, cyst, infection; N39 - Other disorders of urinary system; D05 - Carcinoma in situ of breast; C21 - Malignant neoplasm of anus and anal canal; T85 - Complications of internal prosthetic devices, implants and grafts; K26 - Duodenal ulcer; N43 - Other disorders of prostate; C25 - Malignant neoplasm of pancreas; A49 - Bacterial infection of unspecified site; K35 - Acute appendicitis; K92 - Other diseases of digestive system; K63 – Other diseases of intestine; K55 - Vascular disorders of intestine; G00 - Bacterial meningitis, unspecified; Y60 - Unintentional cut, puncture, perforation or haemorrhage during surgical and medical care; D62 - Acute posthaemorrhagic anemia; J80 - Acute respiratory distress syndrome
Stage 2: The identified top 30 hospitalization diagnostic codes (ICD-10) codes were input into the high-performance logistic regression with a stepwise selection to identify the best parsimonious model to predict SSIs. Table 1, model 1 presents the final model of six hospitalization diagnostic codes to identify SSIs (AUC 0.87, 95% CI 0.86–0.89).
Table 1
The best parsimonious models for prediction of SSIs
Model 1. The best parsimonious model of hospitalization diagnostic (ICD 10) codes
|
Effect
|
*AOR, 95% CI
|
Risk point
|
T81- Operative complication (infection, hemorrhage, etc.)
|
6.40 (5.08–8.01)
|
2
|
K65 - Peritonitis
|
5.87 (3.88–7.88)
|
1
|
B96 - Other bacterial agents causing infections
|
2.56 (1.84–3.47)
|
1
|
K83 - Biliary duct infection, obstruction, perforation
|
6.32 (4.42–8.01)
|
3
|
Y83 - Surgical operation/procedures as the cause of abnormal reaction of the patient/ or later complication
|
2.46 (1.97–3.07)
|
1
|
B95 – Streptococcus/ staphylococcus as the cause of diseases
|
3.25 (2.17–4.87)
|
1
|
Model 2. The best parsimonious model of physician diagnostic (ICD 9) codes
|
Effect
|
AOR, 95% CI
|
Risk point
|
686 - Pyoderma, pyogenic granuloma, other local infections
|
8.13 (6.50–9.20)
|
3
|
682- Cellulitis, abscess
|
4.70 (3.57–6.10)
|
2
|
998 - Other complications of procedures
|
5.68 (4.77–6.78)
|
2
|
556 - Ulcerative colitis
|
8.60 (6.31–9.18)
|
3
|
685 - Pilonidal cyst with fistula, abscess
|
2.69 (1.52–3.76)
|
2
|
560 - Intestinal obstruction without mention of hernia
|
2.97 (2.19–4.01)
|
2
|
154 - Malignant neoplasm of rectum, rectosigmoid junction
|
4.37 (3.29–5.17)
|
2
|
599 - Other disorders of urethra and urinary tract
|
2.04 (1.55–2.62)
|
1
|
153 - Malignant neoplasm of colon
|
2.71 (2.02–3.22)
|
1
|
Model 3. The best parsimonious model of physician procedure claims
|
Effect
|
AOR, 95% CI
|
Risk point
|
Z59 - Digestive system surgical procedure: colon/biliary tract
|
7.38 (6.08–9.09)
|
4
|
C46 - Infectious disease: hospital consult/assessment
|
5.77 (4.66–7.43)
|
3
|
Z10 - Skin/subcutaneous tissue: incision of abscess or hematoma
|
7.88 (6.04–8.67)
|
3
|
C03 - General surgery: hospital consult/assessment
|
3.45 (2.86–4.19)
|
2
|
H15 - Family practice: assessment on weekend
|
2.33 (1.80–3.01)
|
2
|
S16 - Digestive system surgical procedures: intestine
|
1.98 (1.48–2.52)
|
1
|
C20 - Obstetrics and gynecology assessment/consult
|
2.25 (1.66–3.05)
|
2
|
Z08 - Debridement of wound(s) and/or ulcer(s)
|
4.01 (2.83–5.56)
|
3
|
S21 - Digestive system surgical procedures: colon/rectum
|
2.65 (1.91–3.62)
|
2
|
R06 - Skin/subcutaneous tissue: free island flaps
|
4.64 (2.58–6.36)
|
3
|
C13 - Internal medicine: hospital assessment/consult
|
1.96 (1.52–2.36)
|
1
|
H13 - Family practice: assessment/consult on weekdays
|
2.85 (2.18–3.52)
|
2
|
C21 – Pain management: limited consultations
|
1.84 (1.55–2.10)
|
1
|
R11- Operations of the breast: incision, excision, repair
|
2.81 (1.02–3.41)
|
3
|
*AOR, 95% CI = Adjusted Odds Ratio, 95% Confidence Interval |
Stage 3: Risk scores for the final model of hospitalization diagnostic (ICD-10) codes are presented in Table 1, Model 1 (25). Among the entire cohort, 80.3% of patients had a score of 0, 11.8% had a score of 1, and 7.9% had a score equal or greater than 2.
Predictive modeling for physician diagnostic (ICD-9) codes
We identified 442 physician diagnostic 3-digit codes (using ICD-9-CA) recorded within 30 days following the surgery date.
Stage 1: Given a large number of diagnostic codes (possible predictors), the Random forests approach was used to identify a subset of 30 physician diagnostic codes that best predicts SSIs. The best misclassification rate was achieved by using 800 classification trees and 31 variables available for splitting at each tree node. The accuracy of the Random Forests model was 94.7%. The resulted SSI prediction model demonstrated PPV of 98%, NPV of 96%, and AUC of 0.82 (95% CI 0.81–0.83). The accuracy of the model after a 10-fold cross-validation was 94.1%. Figure 2 presents the top 30 important physician diagnostic (ICD-9) codes for prediction of SSIs that have been identified using VIM.
686 - Pyoderma, pyogenic granuloma, other local skin infections; 682 - Cellulitis, abscess; 998 - Other complications of procedures, not elsewhere classified; 556 - Ulcerative colitis; 685 - Pilonidal cyst or abscess; 739 - Nonallopathic lesions, not elsewhere classified; 332 - Parkinson's disease; 599 - Other disorders of urethra and urinary tract; 192 - Malignant neoplasm of other and unspecified parts of nervous system; 257 - Testicular dysfunction; 603 – Hydrocele; 560 - Intestinal obstruction without mention of hernia; 608 - Other disorders of male genital organs; 170 - Malignant neoplasm of bone and articular cartilage; 154 - Malignant neoplasm of rectum, rectosigmoid junction and anus; 821 - Fracture of femur; 075- Infectious mononucleosis, glandular fever; 917- Superficial injury of foot and toe(s); 788 - Symptoms involving urinary system; 153 – Malignant neoplasm of large intestine - excluding rectum; 372 - Conjunctiva disorders (e.g., conjunctivitis, pterygium); 845 – Sprains and strains of ankle and foot; 591 – Hydronephrosis; 184 - Malignant neoplasm of vagina, vulva, other female genital organs; 156 - Malignant neoplasm of gallbladder and extra hepatic bile ducts; 290 - Senile dementia, presenile dementia; 569- Other disorders of intestine; 646 - Other complications of pregnancy (e.g., vulvitis, vaginitis, cervicitis, pyelitis, cystitis); 437- Other and ill-defined cerebrovascular disease; 346 - Other diseases of central nervous system (e.g., brain abscess, narcolepsy, motor neuron disease, syringomyelia)
Stage 2: The identified top 30 physician diagnostic codes were input into the high-performance logistic regression model to identify the best parsimonious model for prediction of SSIs, using a stepwise selection approach. Table 1, Model 2 presents the final models of nine physician diagnostic codes to identify SSIs (AUC 0.85, 95% CI 0.84–0.86).
Stage 3: Risk scores for the final model of physician diagnostic codes are presented in Table 1, Model 2 (25). Among the entire cohort, 77.8% of patients had a score of 0, 7.7% had a score of 1, and 14.5% had a score equal or greater than 2.
Predictive modeling for physician procedure claims
We identified 2,543 physician procedure claims recorded within 30 days following the surgery date. These codes then were clustered into 610 three-digit codes that were used for the further analyses.
Stage 1: Given a large number of physician procedure codes (possible predictors), Random forests approach was used to identify a subset of 30 physician procedure claims that best predicts SSIs. The best misclassification rate was achieved by using 1,000 classification trees and 37 variables available for splitting at each tree node. The accuracy of the Random Forests model was 94.8%. The resulted SSI prediction model demonstrated PPV of 99%, NPV of 97%, and AUC of 0.82 (95% CI 0.81–0.83). The accuracy of the model after a 10-fold cross-validation was 94.4%. Figure 3 presents the top 30 physician procedure claims that have been identified using the permutation VIM.
Z59 - Digestive system surgical procedure; C46 - Infectious disease - non-emergency hospital in-patient services: assessment/ consultation; Z10 - Integumentary system surgical procedures: incision of abscess/ haematoma; K07 - Family practice/geriatrics acute and chronic home care supervision; K99 - Emergency department – special visit premium; C03 - General surgery, non- emergency hospital in-patient services-assessment, visits, consultations; A35 - Urology -consultations/ assessment; S16 - Digestive system surgical procedures; H15 - Family practice & practice in general - weekend and holidays: assessment/care; C64 - General thoracic surgery - non-emergency hospital in-patient services: consultation assessment; H12 - Family practice & practice in general - nights assessment and car; C12- Non-emergency hospital in-patient services: Subsequent visits by the MRP; R11- Integumentary system surgical procedures: operations of the breast; E08 - Hospital and institutional consultations/assessments by MRP; C20 - Obstetrics and gynecology - non-emergency hospital in-patient services; Z08 - Debridement of wound(s) and/or ulcer(s) extending into subcutaneous tissue, tendon, ligament, bursa and/or bone; G55- Diagnostic and therapeutic procedures, critical care; S21- Digestive system surgical procedures: rectum; S65 - Male genital surgical procedures; Z74 – Respiratory surgical procedures; R62- Musculoskeletal system surgical procedures – amputation; A20 - Obstetrics and gynecology - assessment or consultation; Z22 - Musculoskeletal system surgical procedures; R06 - Myocutaneous, myogenous or fascia-cutaneous flaps, neurovascular island transfer, transplantation of free island skin and subcutaneous flap; A24 - Otolaryngology – assessment/ consultation; C13 - Internal and occupational medicine: non- emergency hospital in-patient services; C01 - Non-emergency hospital in-patient services, subsequent visits by the MRP; H13 - Family practice & practice in general –weekdays, evenings: assessment/care; C21 – Consultations/visits anaesthesia -non-emergency hospital in-patient services
Stage 2: The identified top 30 physician procedure claims were input into the high-performance logistic regression model to identify the best parsimonious model for prediction of SSIs. We used a stepwise variable selection approach. Table 1, Model 3 presents the final models of 14 physician procedure claims to identify SSIs (AUC 0.84, 95% CI 0.83–0.85).
Stage 3: Risk scores for the final model of physician procedure claims are presented in Table 1, Model 3 (25). Among the entire cohort, 55.4% of patients had a score of 0, 11.9% had a score of 1, and 44.6% had a score equal or greater than 2.
Full model with total risk score of diagnostic and procedure codes
In the derivation cohort, the total scores of hospitalization diagnostic (ICD-10) codes, physician diagnostic (ICD-9) codes and physician procedure claims were included in the logistic regression model and adjusted for potential confounding factors, including surgical specialties, age, sex, duration of surgery, emergency case, ASA class and concurrent surgical procedures (Table 2).
Table 2
Full model of total risk scores for hospitalization diagnostic (ICD-10) codes, physician diagnostic (ICD-9) codes and physician procedure claims, adjusted for the study covariates
Effect
|
Adjusted Odds Ratio
|
95% Confidence interval
|
Hospitalization diagnostic score
|
2.12
|
1.91–2.20
|
Physician diagnostic score
|
1.88
|
1.75–2.02
|
Physician procedure score
|
1.45
|
1.31–1.56
|
Age < 65 years
|
1.74
|
1.40–2.16
|
Log-operation duration, min
|
1.52
|
1.30 -1 .72
|
Surgical specialty
|
General surgery
|
1.60
|
1.20–2.15
|
Gynecology
|
1.19
|
0.80–1.76
|
Orthopedics
|
0.77
|
0.53–1.11
|
Plastics
|
2.37
|
1.59–3.51
|
Vascular
|
1.75
|
1.12–2.68
|
Other
|
Reference
|
Reference
|
Female
|
1.18
|
0.96 -1 .47
|
Concurrent surgical procedures
|
|
|
1
|
1.05
|
0.67–1.63
|
2+
|
1.09
|
0.67–1.75
|
0
|
Reference
|
Reference
|
ASA class
|
I
|
0.87
|
0.75–1.33
|
II
|
1.21
|
0.80–1.80
|
III
|
1.10
|
0.66–1.76
|
IV
|
0.32
|
0.04–1.03
|
V
|
Reference
|
Reference
|
Emergent case
|
0.99
|
0.79–1.20
|
The full model had excellent discrimination (AUC 0.91; 95% CI, 0.90–0.92) and calibration (H-L statistics, 4.53, p = 0.402). The predicted probability threshold with the optimal operating characteristics (27) (e.g., the square of distance between the point (0, 1) on the upper left hand corner of ROC space and any point on ROC curve) was a predicted risk of 4% (sensitivity, 83.4%; specificity, 89.2%; PPV, 34.2%; and NPV, 99.1%). In the internal validation cohort, the full model remained strongly discriminative (AUC 0.89, 95% CI 0.88–0.90) and well calibrated (H-L statistics, 6.47, p = 0.487) (Fig. 4).