Sample
The present study includes teacher and student participants, from comparison conditions, who were part of three federally funded research studies testing an intervention designed to address the needs of young students who demonstrate persistent and intensive challenging behaviors in classroom settings (Sutherland et al., 2020; Sutherland et al., 2018; IES R305A180182). Teacher and student participants were recruited from early childhood programs and elementary schools in a Mid-Atlantic and a Southern state. All study activities were approved by the associated human participants protection boards.
Teachers
Teachers were eligible to participate if they: (a) taught in early childhood or kindergarten to third grade classroom, (b) served at least one child identified as being at risk of EBD, and (c) consented to participate. The present study includes 100 predominantly female (97%) teachers (n = 84 early childhood teachers; n = 16 kindergarten - 3rd grade teachers). Forty-nine percent self-identified as African American/Black, 47% as White, 1% as Asian/Pacific Islander, and 3% as another race. Only three percent identified as Hispanic/Latino. Most teachers had a bachelor’s degree or higher (63%), and half reported having a teaching license (50%). Teachers ranged in age from 18 to 25 (9%), to over 55 (15%), with other teachers between the ages of 26 to 35 (21%), 36 and 45 (32%), 46 and 55 (21%) and 2% who preferred not to list their age. Teachers had an average of 12.01 years of teaching experience (SD = 10.06). They were given $100-$400 for their participation (amount varied by study).
Students
Teachers selected one to three focal students in their classrooms who exhibited externalizing problem behavior. Eligible students were: (a) enrolled in a participating teacher’s classroom, (b) exhibited externalizing behaviors that interfered with participation in the classroom as indicated by systematic screening (Walker et al., 2014), and (c) had parent/guardian consent to participate. This study included 196 students who had data on all profile measures; 91 students were excluded from analysis because they were missing data on one or more profile measures. Student sample demographics included 68.9% African American/Black, 16.8% White, 11.7% other ethnicities, and 2.6 percent unreported. Five percent of students were Hispanic/Latino. Most participating students were male (66.8%).
Procedures, Measures and Data Harmonization
Our overarching approach to harmonizing data sources across samples was to identify the subset of scale items (either identical or comparable) that was held in common between the three studies included in the study sample. All three studies used identical procedures for screening in eligible students, which we document below, and very similar processes were used for training observers for observational coding of student teacher interactions. We then used measurement models (such as multigroup confirmatory factor analysis) to further inspect the similarity of response patterns for all data collected via questionnaire.
Student Screening
Across all three studies, screening began approximately one month after the start of school. Teachers nominated up to five students who engaged in externalizing problem behavior and caregiver consent was obtained. Then systematic screening for risk of EBDs took place using the Early Screening Project (in early childhood classrooms; Feil et al., 1995) and the SSBD (in kindergarten- 3rd grade classrooms; Walker et al., 2014). The ESP and SSBD are both multigate screening systems designed to identify students who are at risk of negative developmental outcomes associated with their behavior patterns. The first two stages (used given the scope of the intervention) combine teacher ratings of frequency and intensity of student adjustment problems in their classroom. Assessment of risk included raw scores and applying risk criteria to scores (see Walker et al., 2014 for scoring criteria for the SSBD and Feil et al., 1995 for the ESP). Students were screened for critical events (M=1.14, SD=1.85), aggressive behavior (M=19.43, SD=6.52), adaptive behavior (M=23.74, SD=6.05) and maladaptive behavior (M=28.06, SD=7.57). After screening, up to three students per classroom were selected to participate. Following screening, study measures were collected for time point 1 in October - December and again for time point 2 in April-June.
Student Behavior
Student problem behavior and social skills were assessed with the Social Skills Improvement System-Rating Scales (SSIS-RS; Gresham & Elliott, 2007). The SSIS-RS is a 76-item teacher-report measure, evaluating social skills and problem behaviors in young students. The Social Skills scale consists of seven subscales including, communication, cooperation, assertion, responsibility, empathy, engagement, and self-control. The Problem Behaviors scale consists of five subscales including, externalizing, bullying, hyperactivity/inattention, internalizing, and Autism Spectrum. Teachers rate items on a four-point Likert scale (0 = Never to 3 = Almost always) indicating how frequently students exhibit behaviors; higher scores indicate more social skills or more problem behavior. For the current sample Cronbach’s alphas were .97 and .95 for Social Skills and .92 and .94 for Problem Behavior at pretest and posttest, respectively. The present study used standardized scores (standardized by child age and gender). Given the exceptionally high number of items in the SSIS, and the fact that we did not have item-level scores for all three studies, we were unable to perform the same item-level measurement analysis with the SSIS that we do with the STRS below. But we did find consistently high reliability for SSIS subscales across all three samples.
Student Teacher Relationships
The student teacher relationship scale short form (STRS; Pianta, 2001) was used to assess teacher’s perceptions of their relationships with students. The 15-item measure assesses the degree of closeness and conflict a teacher perceives in their relationship with a given child. Teachers indicate their degree of agreement with statements such as “this child and I always seem to be struggling with each other” and “I share an affectionate, warm relationship with this child” on a five-point Likert scale (Definitely applies = 4 to Definitely does not apply = 0). For the current sample, internal consistency was acceptable with Cronbach’s alpha equal to .81 for Closeness and .84 and .86 for Conflict at pretest and posttest, respectively.
Teacher Practice Delivery
The thoroughness and frequency of teacher delivery of practices to the focal student or group in which the focal student was present was measured in early childhood classrooms with the BEST in CLASS Adherence and Competence Scale (BiCACS; Sutherland et al., 2014) and in the elementary school classrooms with the Treatment Integrity Instrument for Elementary School Classrooms (TIES; Sutherland et al., 2015). The BiCACS and TIES are observational measures in which raters assess teachers’ extensiveness (i.e., adherence) and quality of delivery (i.e., competence) of evidence-based practices using a 7-point Likert-type scale. The thoroughness and frequency of teacher delivery of evidence-based practices was measured with the adherence dimension of the BiCACs and TIES. Anchors on the adherence scale range from not at all to very extensive. The BiCACS and TIES include items that represent key evidence-based instructional practices (e.g., Rules, Precorrection, Opportunities to Respond, Behavior Specific Praise, Instructive Feedback, Corrective Feedback; see Sutherland et al., 2014; 2015 for a detailed description of the measure). The present study included observations of teacher-delivered practices across 4 overlapping practices in the BiCACS and TIES. These included Rules, Praise, Opportunities to Respond, and Precorrection.
Reliability was assessed using secondary observers for approximately 20% of observations within each of the three studies. ICCs were computed for each item on each scale. Cicchetti (1994) indicated that ICCs less than .40 reflect “poor” agreement, ICCs from .40 to .59 represent “fair” agreement, ICCs from .60 to .74 represent “good” agreement, and ICCs of .75 and higher represent “excellent” agreement. Across all three studies the mean ICC for the adherence scale ranged from .74-.82, with all items reflecting “good” to “excellent” agreement.
Data Analysis
Confirmatory Factor Analysis and Measure Invariance Testing. First, multigroup confirmatory factor analysis (CFA) was used to test the fit of the Closeness and Conflict scales from the STRS across the three samples and two timepoints. This approach, recommended by Brown (2014), allowed us to test the relative fit of models with increasing constraints, corresponding to configural, metric, and scalar invariance. When results indicated an item did not function equivalently between groups, we freed model constraints in order to reach partial scalar invariance for each of the scales across groups.
Latent Profile Analysis. We used latent profile analysis (LPA) to identify distinct subgroups, or clusters, within a larger population given a series of indicator variables (Lubke, 2018). We used an exploratory approach, testing models that assume a range of possible latent profiles (e.g., 1-profile model, 2-profile model, etc.) and then chose the model that demonstrated the best model fit as indicated by the fit statistics described below.
Five variables were used as indicators of the latent profiles we estimated: 1) teacher delivery of practices, 2) teacher-student closeness, 3) teacher-student conflict, 4) student social skills, and 5) student problem behavior. These indicators were the basis for determining the optimal number of subgroups to answer RQ 1. To answer RQ 2, several additional measures were used as distal outcomes to test the validity of the LPA solution, including time-2 (Spring) measures of all five indicator variables.
The LPA modeling approach we used was the parametric procedure outlined by Finch and French (2014) and implemented in Mplus. The fit statistics used to make the determination of the appropriate number of profiles included: the Bayesian Information Criterion (BIC; Schwarz, 1978), the sample size adjusted Bayesian Information Criterion (SABIC; Sclove, 1987), the Bootstrap Likelihood Ratio Test (BLRT; McLachlan & Peel, 2000) the Lo-Mendell-Rubin Test (LMR; Lo et al., 2001), consistent AIC (CAIC; Bozdogan, 1987), Bayes factor (BF; Wagenmakers, 2007; Wasserman, 1997), approximate weight of evidence (AWE; Banfield & Raftery, 1993) and approximate correct model probability (cmP; Schwarz, 1978). To answer RQ 2, the optimal number of subgroups were taken from the previous step, and then distal outcomes described above were included following the recommended process for modeling outcomes in an LPA outlined by Nylund-Gibson et al. (2019). This approach, referred to as the manual 3-step BCH process (Bakk et al., 2013; Bolck et al., 2004), involves using the posterior probabilities and modal class assignments from an unconditional model (without outcomes) to calculate classification errors for each participant, then including the inverse logits of those errors as weights in the eventual prediction model.
Several methodological issues were accounted for using the following approaches. To adjust for possible violations of the multivariate normality assumption, a robust maximum likelihood estimator (“MLR” in Mplus) was used. To assess model fit, the scaled Satorra-Bentler Chi-Square statistic (Satorra & Bentler, 2001) was used in favor of the traditional Chi-Square test. Standard errors were adjusted for clustering using the sandwich estimator (Huber, 1967; White, 1980) to account for the fact that students were clustered within teachers.