The study of derived relational responding may be traced back to the seminal work of Murray Sidman (1971), and, in particular, his research on stimulus equivalence relations (Sidman, 1994). The basic phenomenon involves training a series of overlapping matching to sample responses using arbitrary stimuli, and then observing novel or emergent matching responses in the absence of differential reinforcement. For example, a participant might be trained first in two overlapping stimulus relations (e.g., A1–B1, B1–C1). Subsequently, the participant may demonstrate symmetrical relational responding (i.e., B1–A1, C1–B1), transitive relational responding (A1–C1), and combined symmetrical and transitive relational responding (i.e., C1–A1). When each of these derived relational responses emerge, the three stimuli are said to participate in an equivalence class. Research on stimulus equivalence, and derived relational responding more generally, has attracted increasing attention within the behaviour analytic literature (e.g., Barnes-Holmes et al., 2017; Dougher, 2020; Tonneau, 2001). One reason for this interest is that equivalence responding emerges in the absence of differential reinforcement, and thus presents a challenge to a three-term contingency explanation for the behaviour. A second reason for the interest is that there appears to be a relationship between equivalence responding and human language (Barnes-Holmes et al., 2005; Bortoloti et al., 2014; Devany et al., 1986; Dias et al., 2020; Sidman, 2018).
One line of research in the study of equivalence and derived relational responding has involved using some of the concepts and procedures employed in the area to develop methods that may be sensitive to specific verbal histories that occurred outside of the experimental laboratory. The basic approach involves training participants in a series of matching tasks that should generate specific equivalence relations, but these relations likely conflict with previously established verbal relations. In one of the earliest studies in this area, participants resident in Northern Ireland were trained to match stimuli that, in principle, would lead to the formation of equivalence relations between Catholic names and Protestant symbols (Watt et al., 1991). In the social context of Northern Ireland, however, these two sets of verbal stimuli would typically be seen as very different or opposite, rather than equivalent. Consistent with their wider social context, the majority of Northern Ireland participants failed the equivalence test, but all of the participants from an English background (the control group) successfully matched the Catholic names with the Protestant symbols. This basic effect, in which pre-experimental history appears to impact derived relational responding, has been replicated across a number of studies (e.g., Barnes et al., 1996; Dixon et al., 2006; Haydu et al., 2015, 2019; Leslie et al., 1993).
The conceptual basis of the foregoing studies, in which the pre-experimental and experimental histories are conflicted, led to the development of a procedure that was designed to assess relational responding “in flight”. The method, known as the Implicit Relational Assessment Procedure (IRAP) (Barnes-Holmes & Harte, 2022; Barnes-Holmes et al., 2008; McKenna et al., 2007), emerged out of an account of equivalence, and derived relational responding more generally, known as Relational Frame Theory (RFT) (Hayes et al., 2001). According to the theory, equivalence is but one class of relational responding, and the IRAP was designed to measure responding in accordance with networks of multiple relations, rather than equivalence alone.
The IRAP employs three sets of stimuli: label stimuli, target stimuli, and response option stimuli. For example, labels can be pictures of faces and targets can be adjectives (e.g., “happy”, “fearful”). There are two classes of labels and two classes of targets, usually in contrast to each other. On each trial, participants are presented with a label stimulus at the top of the screen and a target stimulus at the bottom of the screen. The label (L) and the target (T) stimuli are selected from classes 1 and 2, thus yielding four possible combinations: L1—T1, L1—T2, L2—T1, and L2—T2. The two response options (e.g., “true” and “false”) appear on every trial, and are used to indicate the relational coherence or incoherence between label and target stimuli. For example, assume L1 are pictures of happy faces and L2 are pictures of fearful faces, and T1 are happiness words (e.g., “cheerful”) and T2 are fear words (e.g., “fearful”). If the response options are “true” and “false”, then coherent relational responses (based on participants’ verbal histories) would be: L1—T1 is true, L1—T2 is false, L2—T1 is false, and L2—T2 is true; and incoherent relational responses would be the opposite (e.g., L1—T1 is false). One out of these four possible combinations between a label and a target stimulus is presented on each trial; we will therefore refer to them as trial-type 1: happy-face—happy-word; trial-type 2: happy-face—fear-word; trial-type 3: fear-face—happy-word; and trial-type 4: fear-face—fear-word.
Within blocks of trials, participants are asked to respond to these relations under time and accuracy performance criteria. For example, participants may be asked to pick a response option within two seconds and perform with at least 80% correct responses across the trials of a block. For each block of trials, participants are required to respond in one of two opposing patterns: one is deemed coherent with their history, and the other is deemed incoherent. These two types of blocks are alternated, such that, if an IRAP starts with the coherent block, then the following blocks will be incoherent, coherent, incoherent, and so on; or, if an IRAP starts with the incoherent block, then the following sequence will be coherent, incoherent, coherent, and so on. Response latencies are recorded across coherent blocks and across incoherent blocks. DIRAP scores are calculated by subtracting the latencies of the coherent blocks from that of the incoherent blocks, and dividing them by the standard deviation across blocks. Thus, if participants’ response latencies are on average smaller in coherent than in incoherent blocks, their DIRAP scores will be positive; otherwise, they will be negative. Four DIRAP scores, one for each trial-type, are typically calculated for each participant (see Barnes-Holmes et al., 2010a, p. 533).
The basic assumption behind the IRAP is that history-consistent relational responding is expected to be more probable and quicker than history-inconsistent responding, and this is reflected in the differential response latencies across coherent versus incoherent blocks. This assumption has been supported across numerous empirical studies in which participants respond more quickly in blocks of trials that are coherent than incoherent with their verbal histories (e.g., Barnes-Holmes et al., 2009; Barnes-Holmes et al., 2010c; Kelly & Barnes-Holmes., 2013; Rabelo et al., 2014; Roddy et al., 2010; Sereno et al., 2021; Timmins et al., 2016). An early explanation for such IRAP performances was formalised in the Relational Elaboration and Coherence (REC) model (Barnes-Holmes et al., 2010a). The REC model focussed largely on the coherence of the relationship between the label and the target; thus, responses on the coherent blocks were assumed to be faster than those on the incoherent blocks (e.g., given a happy face and a happy word, participants would pick the response option “true” more quickly than “false”). This explanation focussed on performance differences between coherent and incoherent blocks, but not between trial-types within the blocks. However, differential trial-type effects have been observed (e.g., Finn et al., 2018, 2019), which the REC model could not readily explain.
There are two specific differential trial-type effects that have been observed with the IRAP. For example, Kavanagh et al. (2019) exposed participants to IRAPs that involved presenting, as label stimuli, face words (i.e., “Face”, “Head”, and “Person”) and pen words (i.e., “Pen”, “Stylo”, and “Bic”), and, as target stimuli, pictures of a face or a pen. Participants were required, during coherent blocks, to respond “Yes” when a face word was presented with a picture of a face (trial-type 1), and when the pen word was presented with a picture of a pen (trial-type 4); in addition, during incoherent blocks, they were required to respond “No” to these combinations of stimuli. Furthermore, in coherent blocks, participants were required to respond “No” when presented with a face word and a picture of a pen (trial-type 2), and when presented with a pen word and a picture of a face (trial-type 3); in addition, during incoherent blocks, they were required to respond “Yes”.
Results from Kavanagh et al. (2019) indicated that all group effects were in the predicted direction (i.e., shorter latencies in the coherent relative to the incoherent blocks). However, two key differences emerged between the trial-types which could not be explained based solely on the relations between label and target stimuli. Specifically, the difference in latencies for the face-face trial-type (i.e., trial-type 1) was significantly larger than for the pen-pen trial-type (i.e., trial-type 4). Furthermore, the difference in latencies for the face-pen trial-type (i.e., trial-type 2) was significantly larger than for the pen-face trial-type (i.e., trial-type 3). In both cases, the two trial-types shared the same response option within blocks. That is, the face-face and the pen-pen trial-types (i.e., 1 and 4) both required responding “Yes” during coherent blocks, and “No” during incoherent blocks; moreover, the face-pen and the pen-face trial-types (i.e., 2 and 3) both required responding “No” during coherent blocks, and “Yes” during incoherent blocks. Perhaps any difference between the face-face and pen-pen trial-types, relative to the face-pen and pen-face trial-types, could be explained by the fact that they required choosing different response options within blocks of trials (Barnes-Holmes et al., 2010b). However, differential response options could not explain the difference between the face-face and the pen-pen trial-types, or the difference between the face-pen and the pen-face trial-types. In order to explain these differences, Kavanagh et al. (2019) drew on a model of IRAP performances that had been recently proposed in the literature (Finn et al., 2018; Kavanagh et al., 2018).
The model is referred to as the Differential Arbitrarily Applicable Relational Responding Effects (DAARRE) model. This model is shown in Figure 1, which presents the stimuli used in the current study. Specifically, pictures of happy and fearful faces are presented as label stimuli, and words denoting happiness and fearfulness are presented as target stimuli, with two response options: the words “true” and “false”. Similar to the REC model, the DAARRE model incorporates the relationship between the label and the target (e.g., whether the relationship between the face and the word is coordinate or distinct, defined in RFT as the Crel property). However, the DAARRE model also incorporates the functional properties (e.g., orienting and evoking, defined in RFT as the Cfunc property) of all of the events, including the response options. In Figure 1, the Cfunc properties of the happy faces, the happy words, and the “True” response option are all labelled with a plus sign sign to indicate a generally positive valence relative to the fearful faces, fear words, and the “False” response option (for this reason, the latter are marked with a minus sign).
The DAARRE model explains the two differential trial-type effects mentioned above by appealing to the level of coherence among the Crel and Cfunc properties contained within each of the trial-types. Consider first, the predicted differential effect between trial-types 1 and 4; that is, the DIRAP scores for the happy-happy trial-type will be larger than for the fear-fear trial-type. According to the DAARRE model, for the happy-happy trial-type, there is maximal coherence among the two Crel and the two Cfunc properties during coherent blocks (i.e., four plus signs). In contrast, for the fear-fear trial-type, there is reduced coherence in that the Cfunc properties for the label and target are both negative, but the Crel and the Cfunc properties for the response option are both positive. This difference in coherence between these two trial-types explains the dominance of trial-type 1 (maximal coherence) over trial-type 4 (reduced coherence). Informally, the prediction is that participants will find it easier, all things being equal, to respond on a trial-type in which all of the controlling elements cohere with each other than when they do not. We will refer to this effect as a Single Trial-Type Dominance Effect (STTDE).
Now, consider the predicted difference between trial-types 2 and 3; that is, the DIRAP scores for the happy-fear trial-type will be larger than for the fear-happy trial-type. It is difficult to explain this difference based solely on the coherence/incoherence among all of the elements in the trial-types because, in both trial-types, there is an equal number of +Cfunc and −Cfunc, as well as the same −Crel. However, the two trial-types may be distinguished based on the Cfunc properties of the label and the target. In trial-type 2, the label is positive and the target is negative, and in trial-type 3, the label is negative and the target is positive. Critically, during history-consistent blocks, in trial-type 2, the −Cfunc property of the target (i.e., fearful words) coheres with the −Cfunc property of the correct response option (i.e., “false”); however, in trial-type 3, the Cfunc properties of the target and the correct response option are incoherent (still, during the history-consistent blocks). According to the DAARRE model, the coherence between the spatially-contiguous target and response option in trial-type 2 may facilitate more rapid responding during history-consistent blocks relative to trial-type 3, where the spatially-contiguous target and response option possess incoherent Cfunc properties. More informally, participants may find easier to respond negatively when the target is negative than when the target is positive. Consistent with Kavanagh et al. (2019), we will call this a Dissonant Target Trial-Type Effect (DTTTE).
Previously published studies (Finn et al., 2018; Gomes et al., 2019; Kavanagh et al., 2018; Pidgeon et al., 2021; Pinto et al., 2020) that have considered IRAP performances in light of the DAARRE model have tended to employ it in a post hoc manner (i.e., the DAARRE model has been used to interpret effects that were not explicitly predicted). One recent exception was a study that successfully generated the STTDE by establishing a “true” function for a stimulus that was subsequently presented in an IRAP (Finn et al., 2019). However, this study did not focus on the other DAARRE model effect previously referred to as the DTTTE (see Schmidt et al., 2021). The primary purpose of the current study was to present participants with an IRAP for which the DAARRE model would predict both STTDE and DTTTE effects. Specifically, happy faces and fearful faces were presented with semantically related words (i.e., happiness and fear words). We chose faces because previous research had reported differential trial-type effects using such stimuli (Bortoloti et al., 2019; Kavanagh et al., 2019; Perez et al., 2019; Pinto et al. 2020; Schmidt et al., 2021; see also Bortoloti et al., 2020).
In addition to testing the predictions of the DAARRE model (the STTDE and the DTTTE), we also sought to explore the potential impact of two other variables in the IRAP. The first of these was the order in which the IRAP blocks are presented (coherent-first versus incoherent-first). Specifically, we sought to determine if block order affected the STTDE and/or DTTTE effects. The second variable we sought to explore was whether or not participants maintained criteria at the trial-type level. Research employing the IRAP typically requires participants to maintain performance criteria at the level of the block (e.g., median latency ≤ 2000ms, accuracy ≥ 80%). Given the focus on the explicitly predicted differences between individual trial-type scores in the current study (i.e., the STTDE and the DTTTE), we analysed the data in terms of whether or not participants maintained the performance criteria at the level of the individual trial-type, rather than at the level of the overall block. Given that this was a performance-based variable, it is best considered as an attribute variable rather than a directly manipulated variable (i.e., it is an independent variable based on an attribute of the participants). Again, we sought to determine if this particular performance variable affected the STTDE and/or the DTTTE. Notwithstanding, we should emphasise that analysing the impact of these two explanatory variables (block order and performance criteria) was largely exploratory and thus we made no specific predictions about the nature of the impact that these variables could exert on the two DAARRE model effects. These variables are, nevertheless, crucial in research employing the IRAP. Block order influences the history participants are exposed to during practice blocks before starting test blocks. Trial-type level performance is important because the DAARRE model focuses on performances at trial-types, but IRAP research has not yet looked into performance criteria at the trial-type level. The present article introduces for the first time a data-analytic algorithm to identify the performance at the trial-type level.