Research Article
The Validity of Content Mapping: Let’s Call a Spade a Spade
https://doi.org/10.21203/rs.3.rs-4353956/v1
This work is licensed under a CC BY 4.0 License
posted
You are reading this latest preprint version
concept mapping
multidimensional scaling
cluster analysis
inter-disciplinary research
Content Mapping is an exploratory research method that is “inherently integrative” in its use of both qualitative and quantitative procedures in a structured conceptualization process (see Dixon, 2009, p. 87; Burke et al., 2005). The method was originally developed and introduced by Trochim (1989a) and was designed to yield a representation of reality or an interesting suggestive map meant for planning and evaluation purposes. By the year 1989, projects in which Trochim had used concept mapping ranged from the identification of four staff members’ multicultural awareness goals for a day camp, to the production of a map as an organizing device for long-range planning efforts of the University Health Services at Cornell University (with between 50 and 75 participants), to the development of a framework for designing a training program for volunteers to work with mental patients (number of participants not given) (Trochim, 1989b).
Since the introduction of concept mapping (CM) in the 1980s, it’s participatory character has been recognized as an important and attractive characteristic (Burke et al. 2005) and useful in the development of a community based participatory research program (Windsor, 2013). CM has not only made use of written statements but also of photo-elicitation (Shannon et al., 2020). CM has been suggested as an alternative approach to the analysis of open ended items in questionnaires (Jackson et al., 2002).
Concept mapping appears to have been mainly used in public health oriented research, human services, biomedical research, social science research, and business or human resources research (see Rosas & Kane, 2012). Because of the use of concept mapping in a wide range of academic disciplines, we were eager to learn more about the usefulness of concept mapping in interdisciplinary settings. After all, CM is being promoted as suitable for interdisciplinary and international research. Trochim & Kane (2005) state Concept Mapping is “purposefully designed to integrate input from multiple sources with differing content expertise or interest.”
The aim of our study is to investigate the validity of concept mapping by means of a critical review of the procedure, and a series of experiments simulating an interdisciplinary research setting. In the following sections, we first present the CM procedure based on Kane & Trochim (2007). The described CM procedure raised a number of questions on how technical procedures are done exactly, why it is done that way and what the consequences of these choices are. In Section 3 we present a series of five computer experiments to investigate the validity of CM in simulated interdisciplinary research. For each of the experiments the methods are described, followed by the results and the discussion of these results as input for further experimentation and future research. We conclude with a discussion of the validity of CM in interdisciplinary setting and possible alterations. Our study did not require an ethical board approval because it did not directly involve humans or animals.
The first phase in CM covers preparation. In this stage, the issue to be examined, the goals and desired outcomes are discussed and defined, based on which the facilitator and participants are selected and invited to form a panel. Research questions Kane & Trochim (2007, p. 4) deem appropriate for addressing by CM include, among others, “What are the issues in a planning or evaluation project;” “Do the stakeholders have a common vision of what they are trying to achieve that enables them to stay on track throughout the life cycle of a project;” and “Can stakeholders link program outcomes to original expectations or intentions to see if they are achieving what they set out to achieve?”
The panel is asked a question and participants formulate answers to that question in an individual or group ‘brainstorm session’. The resulting primary statement set is reduced and edited to ensure uniqueness, relevance, clarity and comprehension of the final statement set (Kane & Trochim 2007, Chap. 3). The question posed to the participants is crucial for the statements that will be generated and, thereby, for the rest of the procedure. The editing by the researchers to reach the final statement set, gives plenty of room for interpretation, as is commonly the case in qualitative data reduction.
Next, each participant is asked to sort the Q statements in the final statement set in piles, on the basis of similarity between the statements. The sorting task gives no guidance to the panel members other than that piles must have a minimum of two statements and may not form a single pile. No information is supplied about the attributes to sort on, nor on the number of piles to aim at. After the sorting tasks, participants rate the Q statements on importance or priority on an ordinal scale (Kane & Trochim, Chap. 4). As the sorting task does not steer in any way, each participant can make a different number of piles based on a different set of attributes and (perceived) meaningful commonalities and differences on these attributes. Therefore, one would expect the data to contain plenty of variation. Intuitively we would assume that two statements j and k are more similar (i.e. conceptually close) to one another than to statement h when more panel members put statements j and k on the same pile, than either statements j and h or statements k and h. This principle brings us to the core of CM.
First, a Q x Q similarity matrix is constructed on the basis of how frequent a statement ended up in the same pile as another statement. The similarity matrix is a symmetrical co-occurrence matrix (Leyesdorff & Vaughan, 2006; with citation from book Kruskal & Wish, 1978). With Q the number of statements and N the number of participants, the cells of the Q x Q similarity matrix can take on values between 0 and N. The more frequently statements j and k are put onto the same pile, the larger their perceived similarity and the smaller the distance between the statements. More formally, for each participant we may define:
Not all participants are assumed to have sorted each statement. When in effect njk participants have sorted both statements j and k, summing over all participants who sorted both statements gives the elements of the Q × Q similarity or co-occurrence matrix X (Leydesdorff & Vaughan, 2006):
These similarities (or co-occurences) are transformed into the Q × Q Euclidian distance matrix D with
The distance matrix is input for multidimensional scaling (MDS), a data reduction technique in which the dimensionality is reduced based on distances (dissimilarities) or closeness (similarities) of the Q statements (ref; Lattin, Carroll & Green, 2003). In CM, MDS is used to reduce the dimensionality of the data from Q to, usually, 2. The choice to scale down to just 2 dimensions is motivated by Kruskal &Wish’s (1978) observation that “it is generally easier to work with two-dimensional configurations than with those involving more dimensions.” The coordinates of each statement in the resulting two-dimensional plane are used to identify meaningful clusters of statements by means of K-means cluster analysis using Ward’s method, as it ”…. generally gave more reasonable and interpretable solutions than other approaches such as single linkage or centroid methods” (Trochim, 1989: 8). The number of clusters in the initial solution is somewhere between 3–20 ( Trochim, 1989), while Rosas & Kanes’ (2012) pooled study analysis showed that final solutions on average present 9 clusters and range between 6–14 clusters. The solution, displayed in the so-called point cluster map, depicts the statements in the two-dimensional plane coloured by cluster membership. This attractive visualisation of clustered statements is subjected to interpretation of attributes in Phase 5.
Some researchers have used hierarchical cluster analysis (HCA) instead of K-means clustering (e.g. Shannon et al., 2020). The order of first MDS and then cluster analysis (either K-means or HCA) has been debated by Péladeau et al. (2017) who claimed that the order is best changed into first cluster analysis, then MDS. Also, while it is true that the human mind can comprehend two dimensions better than higher amounts of dimensions, the two-dimensional solution may not adequately reflect distances between statements and may therefore distort interpretation. Given that producing a two-dimensional plot is the goal of these analytic steps, below we inquire if skipping MDS altogether and turning directly to the dendrogram of the HCA may produce an equivalent, if not better, overview of clusters of statements.
In the fifth and last stage, the point cluster maps of the statements are examined either by the participants or the researchers, and are given a name and meaning often based on so-called anchor statements. Sometimes, the K-means cluster analysis is repeated until an interpretable set of clusters is identified by the examiners. In the interpretation session, also consensus across groups or the consistency of results may be discussed (Kane & Trochim, 2007, Chap. 6).
The overall aim of our series of computer experiments was to discover whether the CM procedure described in the previous section is able to identify meaningful clusters in collections of sorted statements when the underlying cluster structure is known in a setting where the panel members come from different disciplines and therefore use different attributes for their classification.
The series of simulation studies was designed on the model of sorting one standard deck of Q = 54 cards with numbers 2 through 10, Jack, Queen, King and Ace of Spades, Clubs, Hearts and Diamonds, and 2 Wildcards. The cards allow for (at least) 3 attributes for classification:
Suits, resulting in 5 piles: Spades, Clubs, Hearts, Diamonds. and Wildcards
Ranks, resulting in 11 piles: with piles Ace, 2, 3, ..., 10, and Picture cards (i.e. Jack, Queen, King, and Wildcard)
Odd-Even-Other, resulting in 3 piles: with piles odd numbers (including ace), even numbers, and Picture cards
We assumed that each participant classified cards according to 1 (and just 1) attribute. In order to introduce some random noise in the data, we further assumed that within a selected attribute for classification, respondents had a high probability to correctly classify (95%), and any wrongly classified card had an equal probability to end up in one of the other piles. Note that a high probability of correct classification prevents the need for replication. Finally, we assume that all participants sorted all statements (so that njk = n for all j, k). The simulations produced 54×54 co-occurrence matrices that were input for further analysis i.e. the MDS as prescribed in CM with two dimensions followed by a K-means clustering using Ward’s method (see Everitt, 1980, p. 65). We will use K = 10.
Five experiments were executed. The first two experiments supply a proof of principle in which the code is checked, and results from MDS2 and HCA are compared for n = 40 respondents. In these simulations, all respondents used the same attribute for classification (either suits, or ranks). The third and further experiments compare the results of MDS2 with HCA when half of the respondents used one, and half of the respondents used another attribute for classification (simulating a multi-disciplinary background of the participants).
Figures 1a and 1b provide the point cluster map and the dendrogram using n = 40 sorters all using Suits as the attribute for classification. Both approaches, CM and HCA, successfully revealed 5 clusters. However, in the second experiment using Ranks as attribute, the Content Mapping based on 2 MDS dimensions failed to identify the 11 cluster structure (see Fig. 2a; note that the configuration suggests a 6 cluster solution rather than the 10 cluster solution forced on the data), which can clearly be discerned in the dendrogram produced by hierarchical cluster analysis using Ward’s method directly on the distance matrix (Fig. 2b).
Figures 3a and 3b depict the results for a simulation where half of the sample (n = 20) used Suit, and half of the sample (n = 20) used Ranks for sorting. Inspection of the point cluster map and the dendrogram supports the interpretation that cards are ranked by suit, and that number cards receive different treatment than picture cards. At no point would one conclude based on the point cluster map, that Rank played a role in sorting.
In order to eliminate the possibility that this is just a small sample problem (small n relative to the number of statements), we increased the sample size by a factor 10 to 400 participants and replicated the last experiment (figures not printed to save space). Again, either clustering technique failed to identify Ranks as a used attribute. Further trials using variations of the proportion of participants using either attribute, revealed that when a clear majority of 70% or more used Rank the cluster solutions reveal Rank as sorting criterion, but this solution then failed to identify Suit as a used attribute.
The fifth and final experiment aimed to find out if the number of piles relative to the number of cards may play a role in the outcome of CM. In this experiment we combined 20 respondents sorting on the basis of Suits (5 piles) with 20 respondents sorting on the basis of Odd-Even (3 piles). The results are in Figs. 4a and 4b. Focusing on the dendrogram, we find clusters of e.g., even and uneven hearts, even and uneven spades, picture diamonds, etc., and we would conclude that (un)evenness by suit would be the hybrid attribute for sorting. The dendrogram does not reflect two different groups sorting on Suits or on odd-Even exclusively.
The two different visualizations of the five classification problems suggest that dendrograms may be more useful to interpret the clusters than point cluster maps (particularly when the number of clusters is large and the names of the statements are complex). Skipping MDS did not cost: dendrograms are easier to interpret than point cluster maps and one does not have to make the assumption that the dimension Q can be reduced to 2. However, neither method is capable to reveal mixtures of attributes. Kane and Trochim consider so-called bridging values important in determining the contents of clusters, which in the first experiment could potentially lead to choosing 4 clusters instead of 5 because the Wildcards are close to another cluster and can be conceived as forming one single cluster. Using bridging values will not help in unmixing mixed sets of used attributes. In these situations, the methods allow either for identification of the attribute with the lowest number of piles, or of some (non-existing) hybrid of attributes, but is unable to separate the two different attributes for sorting. Note that in the highly simplified situation of our experiments, there was no missing data: all participants sorted all statements. There is far more complexity in reality than that: people from the same discipline may actually use different attributes to sort into clusters, and individuals may change or mix attributes during the sorting task (e.g., from red-black to suits) or change the number of values (and thus the number of clusters) they are sorting on (e.g., first sorting numbers 1, 2, ... 10, then sorting odd/even numbers and face cards).
We have created a highly simplified situation where the same statements were sorted on 2 different attributes and both CM and HCA failed to reveal these attributes. The results do not support the claims on the suitability of CM for interdisciplinary research. In applications of CM, the attributes for sorting statements are unknown and the analysis aims to identify the attributes. In the face of failure of clustering algorithms to identify these attributes when they are diverse (as should be expected in the case in interdisciplinary research) a qualitative analysis of the statements may be more fruitful. For example, generated statements may be subjected to content analysis, or insight in the possible attributes that were used for sorting of the statements can be acquired by a thematic analysis of the obtained piles. Also, focus groups discussing the meaning of each statement or the relevance of prior identified possibly relevant attributes may reveal dominant attributes within particular groups as well as differences of attribute selection between groups.
JK and HT wrote the main manuscript. JH and JK wrote the script for R. All authors reviewed the manuscript.
Data and analysis code is provided within supplementary information files.
No competing interests reported.
posted
You are reading this latest preprint version