A cohort of 17 284 patients discharged from an urban academic medical center (Boston Medical Center) between 1/1/2004 and 12/31/2012 were selected, and 44 203 discharges from this cohort comprised the complete dataset. Inclusion criteria for index discharges were diabetes defined by an International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code of 250.xx associated with hospital discharge or the presence of a diabetes-specific medication on the pre-admission medication list. Index discharges were excluded for patient age <18 years, discharge by transfer to another hospital, discharge from an obstetric service (indicating pregnancy), inpatient death, outpatient death within 30 days of discharge, or incomplete data. Readmission documented within 8 hours of a discharge was merged with the index admission to avoid counting in-hospital transfer as a readmission.

The primary outcome was all-cause readmission within 30 days of discharge. The same 46 variables previously used to develop a readmission risk prediction model were evaluated as predictors of the primary outcome to construct and validate all prediction models (see Table, Supplemental Digital Content 1, which presents patient characteristics on the variables analyzed) 7.

Five different measures of performance were assessed for each model.

1) Diagnostic discrimination (*C* statistic): the area under the receiver operating characteristic curve (AUC), for which higher values represent better discrimination 14. Discrimination is the ability of a model to distinguish high-risk individuals from low-risk individuals 15. The *C* statistic is the most commonly used performance measure of generalized linear regression models.16

2) Correlation: the correlation between the observed outcome (readmission) and the value predicted by the model 17. Unlike the *C* statistic, correlation represents a summary measure of the predictive power of a generalized linear model.

3) Coefficient of discrimination (*D*): the absolute value of the difference between model successes (the mean predicted probability of readmission, p^, for readmitted patients) and model failures (p^ for non-readmitted patients) 18. This is a measure of overall model performance with a more intuitive interpretation for binary outcomes than the more familiar coefficient of determination (*R*2).

4) Brier score: the mean squared deviation between the predicted probability of readmission and the observed readmission rate. An overall score that captures both calibration and discrimination aspects, the Brier score can range from 0 for a perfect model to 0.25 for a noninformative model with a 50% incidence of the outcome. When the outcome incidence is lower, the maximum score for a noninformative model is lower 16,19.

5) Scaled Brier score: the Brier score scaled by its maximum score (Briermax) according to the equation 1- Brier score / Briermax 16,20. Unlike the Brier score, the scaled Brier score is not dependent on the incidence of the outcome. For the scaled Brier score, a higher score represents greater accuracy. Briermax is defined as mean(p)*(1-mean(p)) where mean(p) is the average probability of a positive outcome. The scaled Brier score is similar to Pearson’s R2 statistic.16 The Brier score and scaled Brier score were chosen as measures to highlight potential differences seen when the incidence of readmission varies due to the sampling methods described below.

Sampling was performed by two methodologies. The first method included only the first index discharge per patient during the study period (first discharges). The second method included all index discharges per patient (all discharges), regardless of whether the hospitalization was a readmission relative to a prior discharge. The study sample was then divided randomly into a training sample and a validation sample 15. The training sample, which comprised 60% of the patients in the study cohort, was used to develop the statistical prediction models. The validation sample contained the remaining 40% of the patients and was used to evaluate the performance of the prediction models.

Characteristics of the study population were described and compared between the training and validation samples. Categorical variables were presented as number (%) while continuous variables were presented as mean (standard deviation) or median (interquartile range). For the first discharge per patient dataset, the validation sample was compared to the training sample by Chi-square tests for categorical variables and two sample t-tests or Wilcoxon rank-sum tests for continuous variables. For the all discharges dataset, the validation sample was compared to the training sample by univariate generalized linear model for all variables. When analyzing all discharges, only current and prior observations available at the time of each index discharge were used for modeling.

The models can be described in mathematical terms as follows. Suppose the ith patient has ni observations where i=1, 2, … N and jth discharge where j=1, 2, …ni. Suppose Xij is the 46-vector of covariates and Yij is the vector of discharges where Yij=1 of ith subject at jth discharge readmitted within 30 days and Yij=0 otherwise. Xij can be constant over time such as gender or time-varying such as age. We further define a general class of models that specify the potential relation between readmission Y and covariates X as f (E(Y | X)) = h, where f (.) is a link function, such as the logit function, that determines the relationship between Y and X; E (Y | X) denotes the conditional mean of Y given X; and h is a function of covariates, usually a linear function such that h = + X where and are log odds ratios. We fit the readmission prevalence model assuming a logit link to estimate the effect of covariates as logit (Y|X) = + X. Then the probability of readmission within 30 days is P((Y=1)=exp( + X)/[1+exp( + X)]. This probability is estimated and used to compare performance of four statistical approaches described below.

The four approaches used to predict readmission were: 1) logistic regression using the first discharges, 2) logistic regression using all discharges, 3) GEE logistic regression with an exchangeable correlation structure using all discharges, and 4) CWGEE logistic regression with an exchangeable correlation structure using all discharges. CWGEE logistic regression is an extension of GEE logistic regression that accounts for cluster size when the outcome among observations in a cluster is dependent on the cluster size (i.e., when cluster size is informative) 11. For each approach, univariate analyses were performed for all variables to determine those associated with 30-day readmission (*P*<0.1). Multivariable models with best subset selection were performed to determine the adjusted associations of the variables with all-cause 30-day readmission 21,22. Variables associated with 30-day readmission at the *P*<0.05 level in the multivariable models were retained.

To examine the effects of sample size on model performance, we conducted resampling studies across a range of sample sizes from 2 000 to 17 000 patients by intervals of 1 000. We randomly sampled each subset from the complete cohort of 17 284 patients. Each subset was then randomly divided 60% for model development and 40% for validation. For each dataset, the models were developed and compared as described above. Changes in model performance measures over sample size are displayed by line charts and compared by analysis of covariance. Lastly, we examined the relationship between the number of discharges per patient (cluster size) and the number of readmissions per patient by Pearson correlation. In addition, correlations between the number of discharges per patient and predicted readmission rates were assessed. All statistical analyses were conducted using SAS 9.4 (SAS Institute, Cary, NC) and Stata 14.0 (StataCorp, College Station, TX). Institutional Review Board approval was obtained from Boston Medical Center and Temple University.