County matching and baseline covariate controls increase the rigor for this longitudinal natural experiment, which is described consistent with STROBE guidelines25 (see Supplemental Materials). The study will measure target mechanisms and key outcomes in 475 Stepping Up counties and 475 matched counties at three waves: study baseline, 18 months, and 36 months. Since counties self-select to be in Stepping Up, we cannot randomize counties to the Stepping Up Initiative. Instead, we use a case-matched design using the Centers for Disease Control and Prevention (CDC) peer counties methodology to identify non-Stepping Up comparison counties with similar size, demographics, and health, economic, and justice indicators.26-28 Stepping Up began in 2015 and has been registering counties over time. Therefore, at study baseline, the 475 Stepping Up counties had been participating in Stepping Up from one to five years. Wave 1 (i.e., baseline) values and months since each county began Stepping Up will serve as covariates. Therefore, analyses examine differences between groups in within-county rates of change between study waves controlling for Wave 1. Our quantitative survey results will be augmented with qualitative interviews with 60 counties at each wave to enrich our understanding of how the implementation mechanisms work.
The Stepping Up Initiative
The goal of Stepping Up is to reduce the number of individuals with mental illness in jails and to improve access to community mental health services for currently or potentially justice-involved individuals. To join Stepping Up,14 counties pass a resolution to address behavioral health disorders (i.e., reduce unnecessary use of jail, increase access to behavioral health services) using a broad, locally-adaptable six-step action plan: 1) convene a diverse team of leaders; 2) collect and review data on individuals in the justice system; 3) examine treatment and service capacity; 4) develop a plan with measurable outcomes; 5) implement the plan; and, 6) track ongoing progress with data. Table 1 and Figure 2 show how we mapped Stepping Up’s six steps onto hypothesized CJ-IIM mechanisms. Within the six broad steps/strategies outlined by the national Stepping Up Initiative, counties can use their own approaches based on their own priorities.
Stepping Up outlines steps to help counties become more data-driven. Leaders of agencies within each county are asked to agree upon a mutual definition of terms such as “mental illness,” “connection to community-based care,” and “recidivism.” Stepping Up then encourages counties to identify and use a universal, validated mental health al screening instrument for new intakes into the jail and other agencies. The screening tool identifies individuals in need of a full clinical assessment. Stepping Up offers toolkits to help counties examine how to capture screening and assessment results electronically and engage in information-sharing agreements. Counties are encouraged to track data on four key measures to assess impact of their efforts over time: 1) number of people with mental illnesses who are booked into jail, 2) average length of stay, 3) percentage of people connected to community-based treatment after release from jail, and 4) rate of return to jail. With this data infrastructure in place, counties can assess the effects of their efforts to address patient needs (e.g., substance use, mental illness, family discord), continuity of care, and reach of services. As Stepping Up counties work to iteratively improve services, they can track progress, focus county leaders on key outcome measures, and make the budgetary and programmatic case for needed resources. This study will assess counties’ progress on these steps and will categorize local approaches.
Stepping Up counties. When this project was submitted for funding, there were 475 counties designated as Stepping Up counties. Although new counties continue to join Stepping Up, for feasibility, this study will assess the 475 counties originally proposed.
Comparison counties. A peer group of 475 matched comparison counties was created using a county grouping methodology developed by CDC26-28 and updated by our team.29 Matching variables were drawn from three primary data sources: (1) Vera Institute’s incarceration trends database30 for county pretrial and jail populations; (2) County Health Rankings & Roadmaps31 for health, economic, social, and demographic information; and (3) the Uniform Crime Report32,33 for crime and policing data.
Counties were nested within states and clustered on health and social indicators. Therefore, a hierarchical matching approach needed to be developed to accommodate state- and county-level covariates. The study principal investigators initially chose 34 of the most potentially relevant variables from the datasets based on expert knowledge. These variables included demographic factors (e.g., median household income, unemployment, total population, high school graduation rate, percent African American, percent Hispanic), inequality indicators (e.g., income inequality, residential segregation), health factors (e.g., poor mental health days, poor physical health days, HIV incidence), healthcare (e.g., mental health providers per capita, primary care physicians per capita, percent of drug treatment paid by Medicaid), crime, and criminal justice (e.g., per capita numbers of police officers, jail population, jail pretrial population, juvenile criminal cases). Based on random forest models and team feedback, these variables were reduced to 29 total variables: 22 predictors and 7 variables reflecting jail populations and mental health providers in the area (factors central to Stepping Up activities). Third, shrinkage based variable selection techniques were applied to select variables that best predicted jail population per capita, pretrial population per capita, and per capita rate of mental health providers, without collinearity. Next, logistic models (which included both predictors and dependent variables in the previous models) were fitted to define variable weights and estimate the likelihood of each county classifying as a Stepping Up or non-Stepping Up county. Using these weights, matching scores were calculated for each county and used in an algorithm to find the best control matched county for each Stepping Up county among potential comparison counties within the same state.
The final variables used for county matching scores included per capita rates of: mental health providers, daily jail population, daily jail pretrial population, primary care providers, police, licensed psychologists, and community mental health centers. Final variables also included average number of physically unhealthy days (of 30), high school graduation and income inequality rates; total county healthcare expenses, percent African American population, percent Hispanic population, percent drug treatment paid by Medicaid, county population, and an indicator reflecting presence of a medical school in the county. In states where the number of Stepping Up counties was higher than the number of potential comparison counties, state location, Medicaid expansion status, and justice/mental health policy were used to pair comparable states and then algorithmically match at the county level. If a county from the comparison group joins Stepping Up during the first year of the study, we find a new matching county. If this occurs after the first year of the study, the pair will be removed from analyses.
Survey respondents. The overall sample is 475 Stepping Up counties and 475 comparison counties. In each county we will survey the administrators of community mental health, jail, probation, and community substance use treatment agencies (i.e., up to 4 respondents per county and ~3,800 total; see Table 2). These respondents were selected because the jail and probation systems have the majority of individuals under justice control in a county, and mental health and substance use treatment administrators are responsible for the provision of behavioral health services for justice-involved individuals in the community.
To compile the respondent list, the research team developed a database of all Stepping Up and matched comparison counties. NACO-CSG-APAF provided a list of county contacts for Stepping Up counties. We contacted these individuals to provide the appropriate contact information for jail, probation, mental health, and substance use administrators in their county. We also conducted web-based searching. For comparison counties, we identified county-level experts through web-based searching. When contact information was not publicly available, we called individual agencies to identify the correct respondents. We also engaged in a snowball technique, in which we contacted experts already identified for assistance in identifying other possible respondents in their county.
Survey administration. The web-based survey is administered using Qualtrics. Using a procedure described by Dillman,34 respondents receive an introductory email that includes a NACO-CSG-APAF endorsement letter of support as well as key information to collect prior to beginning the survey (e.g., budget and staffing data). One week following administration of the introductory e-mail, an invitation to participate in the survey is sent using Qualtrics. The research team sends follow-up e-mails once a week for three weeks following the initial invite with a reminder to participate. If the survey is not completed by the end of week four, research team members make follow-up phone calls. During these calls, the research team provides multiple options for the respondents to complete the survey, including completion of the survey via telephone and receipt of a paper copy. Following the phone call, the research team continues to follow-up with respondents biweekly. Given the current context (i.e., COVID-19), we anticipate encouraging survey participation for six months before closing the survey. We will also provide county-specific feedback reports on county-level CJ behavioral health indicators as an incentive for study participation.
Survey validation. We use existing, validated measures where possible. When we needed to tailor items to CJ or to mental health, we used Cook35 strategies for item development by testing new items using cognitive interviews.36 Interviews covered question comprehension, decision processes, and response options.
Ten cognitive interviews were conducted in May 2020 with volunteers from Stepping up counties representing jail, probation/parole, community mental health, and community substance use treatment. Interviews were conducted via videoconference. Team members met to iteratively review interview results and revise the survey. At these meetings each interviewer presented the responses and reflections from their interviews. Volunteer comments and interviewers/notetaker feedback, along with expert review by team members, were used to revise the survey. Changes were made to simplify and clarify survey questions and to remove redundancies. The amended survey was again reviewed by all team members.
All measures will be collected at all 3 time points. All respondents will receive the same assessments. We refer to measures as “agency-level measures” when analyses of these measures will account for nesting within counties, but the primary focus is on agencies. We refer to measures as “agencies nested within counties” when we used nested analyses and our primary focus is on the county level. We refer to measures as “county-level” measures if they produce a single value for the county to be analyzed at the county level.
Descriptors, predictors, and moderators (agency-level). A series of measures will be used to describe the inner context of each agency. Type of agency will be characterized using the National Criminal Justice Treatment Practices (NCJTP) survey About Your Organization scale.37Staffing, including type, number, and turnover, will be measured using adapted NCJTP Staffing scales.37Organizational Culture Support for Innovations (a proposed moderator) will be assessed using an adapted version of the NCJTP Assess Your Organizational Culture scale.37
Aim 1: Target mechanisms. Use of and capacity for performance monitoring (agencies nested within counties).We created a Performance Monitoring measure which provides one point for each of the following: (1) whether counties are able to measure the 4 Stepping Up core metrics (number of mentally ill people who are booked into jail, average length of jail stay, percent who are connected to community-based treatment upon release from jail, and rate of return to jail) (up to 4 points), (2) each metric they regularly report (up to 4 points), and (3) each metric used for ongoing decision making (up to 4 points) (up to 12 points total). A secondary measure to capture performance monitoring identifies 7 kinds of decisions (e.g., budget preparation, medicine supply), and asks whether they were guided by each of the 4 Stepping Up core metrics (0=no, 1=yes), for up to 28 points possible. This measure was adapted from the Routine Decision-Making scale of the Performance of Routine Information System Management (PRISM) Toolkit.38
Use and functioning of interagency teams (agencies nested within counties). To examine the activities and functioning of interagency teams, we integrated the NCJTP Relationship Assessment Inventory37 with additional items based on the goals and priorities of Stepping Up. This integrated scale contains 18 items such as “we share general information about populations in need of treatment services” (0=no, 1=yes) with one point assigned for each collaborative activity across the other 3 agencies. The total score (up to 54) reflects joint activities among agencies.
Common goals and mission across agencies (agencies nested within counties). The primary measure (an adapted NCJTP Goals/Mission scale)37 assesses each respondent’s perception of the degree to which their agency goals and overall county goals align. Respondents are given a list of goals (e.g., public safety/protection, provide mental health services) and are asked to rank them according to: (1) their agency’s priorities, and (2) county priorities. A kappa score reflects the degree of consistency between the two lists. The secondary measure, a county-level measure, will be agreement (kappa) among respondents within counties of ratings of the importance of providing mental health treatment services for justice-involved individuals in jail and in the community (separately) on a scale of 1 (unimportant) to 10 (important).
System integration (agencies nested within counties) is a dichotomization of the NCJTP Relationship Assessment Inventory37 total score (i.e., excluding the additional items). Counties with scores of 18 or more are considered to have achieved “system integration.” A secondary (county-level) measure will reflect the degree to which each of 12 listed behavioral health screening and assessment instruments are used by and/or shared among multiple responding agencies within a county. For each of the 12 instruments listed, counties will receive a score (0 = no agencies use the same instrument, 1 = two agencies use the same instrument, 2 = three agencies use the same instrument, and 3 = all four agencies use the same instruments).
Aim 2: Implementation outcomes. Number of justice-involved adult clients receiving behavioral health services (agencies nested within counties; primary). After defining “justice-involved” and asking whether each of the EBPP described below is available in the county, we ask respondents how many justice-involved individuals received any mental health service and how many received any substance use service in their agencies in the past year.
Number of behavioral health EBPP available to justice-involved individuals (county level). Mental health EBPPs were taken from treatment recommendations for justice-involved individuals39-45 and from community standards for treatment of serious mental illness posttraumatic stress disorder, borderline personality disorder, suicide thoughts or behaviors, anxiety, insomnia, and pain.46-52 Substance use EBPPs were taken from the U.S. National Institute on Drug Abuse’s consensus list.53 Using the EBPP list describe above, we ask whether each EBPP is available to justice-involved individuals in the county. If any of the respondents answers “yes,” we count that EBPP as being available to justice-involved individuals in the county.
Resources for behavioral health EBPP for justice-involved individuals (agency-level).Respondents will be asked to report whether their agency has experienced an increase (+1), no change (0), or decrease (-1) in funding from the prior year in 13 different areas (e.g., “screening and assessment”). We will cluster these 13 areas using factor analysis and then create total scores for each factor, which will serve as primary outcome/s. Initially, we planned to assess the total dollar amount of resources devoted to behavioral health services for justice-involved individuals, but found that most agencies could not report this number. Secondary measures relate to capacity and training: (1) the proportion of staff in clinical roles, (2) how many staff participated in behavioral health-related training in the past year, and (3) number of staff hired minus the number who left in the prior year. Lastly, we will use the NCJTPS’s Assess Your Resources scale,37 which uses Likert scale items, to measure respondent perceptions of the adequacy of resources available in their agency.
Aim 3: Characterize implementation processes and critical incidents. Qualitative. We will use qualitative data to triangulate quantitative findings, enrich our understanding of how the target mechanisms work and lead to outcomes, and critical incidents to EBPP implementation success or failure. Qualitative data will include interviews with 30 of the 475 county pairs (60 paired counties total). County pairs were randomly selected at Wave 1 (stratified by small, medium, and large county population) and followed longitudinally at Waves 2 and 3. We anticipate 180 qualitative interviews (60 respondents at 3 time points). We will alternate CJ and behavioral health respondents to obtain multiple perspectives on the county’s progress. Respondents will be invited for interviews regardless of their survey status (i.e., completed, not yet completed, declined) for that wave.
Fidelity to Stepping Up/Quantitative characterization of implementation strategies and sub-strategies used to improve mental health or substance use services for justice-involved individuals and/or to reduce the number of people with mental illness in jail (agencies nested within counties). We will use a checklist with strategies and their descriptions constructed from the six main Stepping Up strategies as well as categories conceptualized by Powell54 and the CJ-IIM.17 Respondents will select whether anyone in the county is “planning to address this”, “some progress made”, or “significant progress made” using each strategy.
Power Analyses. The expected sample (475 paired counties, up to 4 respondents per county, resulting sample size of ~3,800) and the response rate of 50% gives an expected sample of 1,900 respondents. Given that anywhere from 0 to 4 respondents may complete the survey in any given county, with a 50% overall response rate, we anticipate that 712 counties will have at least one respondent who completes the survey. We used a conservative (higher than expected) intraclass correlation coefficient of 0.1 for addressing clustering of agencies within counties.
For county-level analysis, an effect size of 0.2, power of 0.8, confidence level of 95%, and statistical significance level of 0.05 were used to calculate minimum sample size. Repeated measures analysis required a minimum total sample size of 304 counties. For logistic regression and other non-linear predictive models, depending on the type and quantity of variables used in the model, the minimum total sample size required varied between 156 and 489 counties. With attrition, our calculations showed a power of 0.9 and more for most county-level analyses.
For agency-level analysis, a conservative agency-level effect size estimate (d = 0.1), a power of 0.8, confidence level of 95%, and significance level of 0.05 were used to calculate minimum required sample size. Repeated measures analysis comparing respondents from Stepping Up and comparison counties resulted in a minimum sample size of 524 respondents. When comparing the response measures, measured at the agency level, over time a minimum sample size of 1200 respondents is required. For predictive logistic regression and other non-linear predictive models, the minimum sample size varied between 673 and 1,100 respondents. Given the larger sample size of this study, our calculations showed a power higher than 0.9 for agency-level analyses.
General approaches. Primary tests will be 2-sided with α=0.05. Analysis approaches accommodate nested and repeated measures data. We will examine predictive associations between Stepping Up membership, hypothesized target mechanisms, and implementation outcomes over time. We will use general linear models and generalized linear mixed models (GLMM) when the dependent variables are continuous and non-continuous, respectively. GEE will be used instead when distributional assumptions are not met. For non-aggregated dependent variables reported at the agency-level (i.e. hierarchical data), a random intercept growth hierarchical linear model (GHLM) will be fitted. All analyses will covary: (1) Wave 1 (baseline) values of dependent variables, (2) months since the county joined Stepping Up, (3) the matching score, (4) an indicator representing whether a the county shares their mental health administrator with other counties, and (5) a similar indicator for shared justice roles across counties.
Missing data. We will review survey completeness and recontact respondents to address quality issues and increase response rates. Logistic regression will be used to determine the type of missingness. Within waves, multiple imputation techniques will be applied. To address issues of missing data across different waves (i.e. over time), we will use generalized estimating equations (GEE) or weighted GEE, depending on the type of missing data.
Aim 1a: Comparison of target mechanisms between Stepping Up and non-Stepping Up counties. Primary. We will test the hypothesis that Stepping Up counties will show faster rate of improvement in use of/ capacity for performance monitoring (i.e., total scores on our Performance Monitoring measure) than comparison counties, using GLMM or GEE. Analyses will test for differences in slopes (rates of change). Separate secondary analysis will compare rates of change in the adapted Routine Decision-Making scale total score between Stepping Up and comparison counties. Secondary. We will test the hypothesis that Stepping Up counties will show faster rate of improvement in the use/functioning of interagency teams (i.e., total scores on the integrated NCJTP Relationship Assessment Inventory-IOR measure) than control counties using the same statistical techniques. We will test the hypothesis that Stepping Up counties will show faster rate of improvement in common goals and mission across agencies (i.e., agreement between perceived agency and county priorities) than comparison counties using GLMM or GEE. We will conduct similar analyses of agreement among respondents within each county on the importance of mental health treatment for justice-involved individuals. We will test the hypothesis that Stepping Up counties will show faster rate of improvement in system integration (score of 6 or more on the Relationship Assessment Inventory score) using GEE. Separate secondary analysis will compare rates of change in use of the same screening and assessment instruments by multiple agencies in Stepping Up and comparison counties.
Aim 1b: Tests of mediation. Primary. We will test the hypothesis that changes in use of performance measures (i.e., scores on UPMDC and the adapted Routine Decision-Making scale) will mediate any differences found in rates of change in primary measures of justice-involved clients receiving behavioral health services, number of EBPPs, and resources available. These primary mediator analyses will use structural equation models, and path analyses. Secondary. We will conduct a series of analyses examining changes in interagency teams, common goals and missions, and integrated systems of care using scores from respective measures identified above, as mediators of number of justice involved individuals receiving services, number of EBPPs available, and number of resources using appropriate baseline measures or months since joining Stepping Up as controls.
Aim 2: Comparison of implementation outcomes between Stepping Up and non-Stepping Up counties. Primary. We will test the hypothesis that Stepping Up counties will show faster rate of improvement in number of justice-involved clients receiving behavioral health services than will comparison counties, using GLMM and GEE. Analyses will test for differences in slopes (rates of change) between the two sets of counties. Secondary. We will separately test the hypotheses that Stepping Up counties will show faster rate of improvement number of behavioral health EBPPs available to and resources for behavioral health EBPP for justice-involved individuals using GLMM, GEE, and GHLM.
We will examine moderators of the effects of Stepping Up participation on our primary outcome (justice-involved clients receiving behavioral health services) using structural equation models. A priori moderators include months between a county joining Stepping Up and study baseline, levels of implementation outcomes at study baseline, type of agency, organizational culture support for innovations (i.e., score on NCJTP Assess Your Organizational Culture scale37), jails with their own behavioral health staff, yes/no presence of legislative reforms, counties in states that have mental health diversion funding, and counties with divisions that provide cross-system trainings.
Aim 3a: Characterize implementation processes and critical incidents (quantitative). We will examine the relationships between use of implementation strategies identified in the Implementation Strategy Checklist and faster rates of change in implementation outcomes, using GLMM and GEE while controlling for baseline measures and months since joining Stepping Up. We will use Bonferroni correction to control for multiple comparisons of implementation strategies (using the Checklist) for each of the three implementation outcomes. Fidelity. We will compare Stepping Up and comparison counties on rates of use of Stepping Up strategies as a measure of fidelity to National Stepping Up program, and we will compare counties on rates of use of other strategies to explore whether Stepping Up impacts related strategies.
Aim 3b: Characterize implementation processes and critical incidents (qualitative). Qualitative data will be analyzed in line with study aims and key research questions using a two-stage analysis plan. In Stage 1, after each interview, interviewers will summarize key topics in framework matrix,55 which allows key topics to be reviewed quickly. In Stage 2, recordings will be transcribed by a professional transcription service and will be anonymized before coding. Deductive codes will be drawn from interview question topics using the CJ-IIM, the 6 Stepping Up main strategies, and critical incidents. Inductive codes capturing emergent themes will arise from team-level review of transcripts. Coding team members will independently code transcripts; 20% will be double coded and reviewed for fidelity. Codes will be entered into NVivo166, using thematic171 analyses; an audit trail will be maintained through code development and analysis. We will compare patterns found in qualitative data to patterns found in our quantitative data; this side-by-side comparison of patterns can identify sign-posts for additional exploration and analyses.