This section discusses the implications derived from our work. We reflect on the main findings of the research questions and address the threats to the validity of the results.
5.1 RQ1: Is there a valid and accepted definition of “data-driven storytelling”?
Even though the concepts of “data visualization” and “storytelling” are tightly associated, when it comes to creating effective visuals, they are not interchangeable, and visualizing information does not necessarily mean creating a narrative visualization. To better understand what data storytelling is, we articulate the differences between information visualization and storytelling below, and then derive a working definition of data storytelling.
Information Visualization
The term data visualization is used to describe the visualization of large data sets. Data visualizations were primarily meant for use in the scientific field and aimed at an efficient reading of data in analytical tasks [98]. Information visualization is a subfield of data visualization [1] that leverages the cognitive capacity of human visual perception, evolved for fast pattern detection and recognition, to communicate underlying relationships and trends in large datasets. The major goal of Information visualization is to amplify cognition [99], and help people perform tasks effectively [3].
The term “chart” or “visualization” describes a graphical representation of data. Charts, particularly within the context of presentation or persuasion, are designed to aid in the memorability of the presented data [100] and reduce cognitive load [99], [101], [102].
Storytelling and Narrative
Storytelling is a central aspect of human communication and cognition: for a long time, people have used stories to convey information, values, and experiences. In the research field, we found fragmented views and inconsistent definitions: some authors draw a distinction between “storytelling” and “narrative,” while others used the terms interchangeably.
Storytelling is defined as the social and cultural activity of sharing stories [103]. A fundamental aspect of storytelling are emotions and the cognitive responses the story evokes in its audience [104]. Gabriel [105] defined stories as “emotionally and symbolically charged narratives” oral or written that “usually have a plot, characters, aim to entertain, persuade or win over.”
According to the Oxford English Dictionary, narrative is described as “an account of a series of events given in order and with the establishing of connections between them.” It combines the narrative contents (story) and the narrative form (discourse) [104]. In simple terms, narration is the telling of a sequence of events to convey a story to an audience. A well-told story conveys great quantities of information in relatively few words in a format that is easily assimilated [106].
Data Storytelling
Kosara and McKinlay [23] define a data story as an ordered sequence of steps consisting of visualizations that can include text and images but are essentially based on data.
Riche et al. [22] refer to data-driven stories as stories that start from a narrative that is either based on or contains data, often portrayed by data visualizations, to clarify, inform and provide context to visually salient differences. In [20], the authors describe it as a sequence of “story pieces” (facts backed up by data), visualized to support one or more intended messages. The visualization can include annotations (labels, pointers, text) or narration to highlight and emphasize the message and to avoid ambiguity. These story pieces are presented in a meaningful order to support the author’s main goal (educate, persuade, convince, for example).
Based on these results, we derive a definition of data storytelling, by refining and incorporating previous ones, as: “The creation of narrative visualizations to convey an intended message, which can include images, text, and annotations to emphasize the message, avoid ambiguity, and facilitate decision making.”
As [107] points out, data storytelling sits at the intersection between data, traditional narrative, and visualization (Fig. 6). This is also consistent with the work of Edmond and Bednarz [120], as they propose “NarVis” (narrative visualizations) that are situated between data visualization and narrative. According to [108], the essence of a narrative visualization is a good storytelling. A story worth telling challenges the reader and is a means of discovery. It drives the audience to ask more questions and pushes them from simply believing to knowing with a degree of confidence [26].
5.2 RQ2: What are the data storytelling best practices reported in the literature and how are they implemented?
The aim of this question was to collect and summarize existing guidelines for creating narrative visualizations, and when possible, to also explain how they are implemented. In general, we observed that there is not a lack of guidelines, but rather they are scattered across the literature and some of them only apply to a certain type of chart.
The most frequently mentioned best practice was BP17, “choose the visualization technique that better supports the expected tasks.” It is also one of the practices with the largest number of implementations (over thirty), as it directly influences the amount of time it takes for a user (or decision maker) to solve a problem, and therefore, its complexity [122]. As discussed in several primary studies, different charts (or design choices within a single chart) perform better than others depending on the task, and designers must consider how they want the display to support a specific task, at potential cost for others [123]. For instance, spotting outliers in a scatterplot would be difficult at low marker opacity but estimating data density could benefit from it [59].
We believe this guideline is intrinsically related to BP5, (“select the appropriate visualization considering the types of data to represent and the advantages and disadvantages of each technique”). This is so because one cannot choose the appropriate visualization without also considering the target tasks, both critical to aid the user in understanding the underlying data and improve decision making. BP5, however, also highlights the importance of tailoring visualizations to their audiences, considering aspects like chart familiarity and learning curves.
The second most referenced practice was BP8 “map information and data dimensions to the most salient visual features”, followed by BP15 “use text, labels and annotations for effective information consumption and decision making”. BP8 reflects on whether the visualization uses comprehensible data encodings, as suggested by [31]. Features include color, size, orientation, and shape, which allow the user to perform the required tasks effectively. BP15 focuses on enhancing the interpretability of the information depicted in the charts while also developing the narrative aspect. Titles and text are key to increase memorability. As pointed out in [41], a good title can make the difference between a visualization that is recalled correctly from one that is not. Labels, in turn, help orient the user, and annotations can be used to highlight interesting patterns.
We found that many practices resemble user interface design guidelines, such as BP3, BP22, BP33, BP36 or BP37. Other practices mainly focus on storytelling issues, such as BP25 and BP30. Moreover, we did not find any practical demonstrations for BP4, and BP26 and BP38 as they are straightforward guidelines and generally do not require illustration.
Overall, these findings indicate that each best practice might be associated with one or more evaluation criteria, as each one serves a purpose (e.g., improving usability, increasing memorability, or enhancing comprehension, among others). Nonetheless, further analysis is necessary to validate these associations.
5.3 RQ3: What are the criteria to evaluate narrative visualizations?
The goal of this question was to investigate the factors involved in quality visualizations, particularly, for narrative and storytelling purposes. This has been a topic of growing interest in the research community over the years. In general, a visualization is considered effective if it helps people extract accurate information [111] without further complexity [44].
Although several studies propose different sets of criteria, we found that there are no unified standards for what constitutes an “effective” visualization. Instead, every author focuses on evaluating a given aspect of a visualization and the traits it encompasses. We collected the most mentioned, well-known criteria and grouped the less common features as sub-criteria
Among the primary criteria, we found Memorability, Comprehension, and Engagement. Several studies emphasize the importance of memorability in visualizations, however, not every author agrees on it [72], arguing that the fact that the audience remembers a visualization does not necessarily mean it is effective. Moreover, there are certain challenges to measure it [48]. We believe memorability is a fundamental aspect to remember information prior to making decisions; particularly when the user has limited time to interact with the visualization (e.g., company meetings, crisis settings [124]).
Comprehension was pointed out by three studies as the primary goal of any visualization. We found the most related items (16) for this criterion. Even though it is highly related to literacy, we did not include it as sub-criteria since it is inherent to the user [125], rather than a visualization trait. Designers can tailor visualizations to support the audience’s various levels of literacy, thus making information as comprehensible as possible.
Engagement was pointed out by three studies as a complex construct involving several factors such as aesthetics, user control, or exploration, for instance. Despite it lacks a clear definition, its main concern is the user’s immersion in a visualization [126]. A visualization being viewed for a certain amount of time and receiving interactions, will be considered as more successful than others. As suggested by Mahyar et al., it is even more important when the target audience are not domain experts [127]. While there have been several efforts towards this direction, there is no unified approach to measure engagement yet.
Among the most frequently mentioned sub-criteria, we can mention the “aesthetics” or style of visualizations. This makes sense, given that it is tied to every major criterion: if a user finds a visualization aesthetically pleasing, the more willing he/she is to use it [128] (perceived usability), spend time on it (engagement) and remember the information (memorability).
Our findings differ slightly from what [73] proposes as criteria to evaluate data-driven stories, as we did not find “dissemination” or “impact” per se in the primary studies. It can be argued, however, that these terms are highly related to engagement, and more studies are necessary to have a deeper understanding of it. Regarding impact, some of the studies mentioned in previous sections have tested the effect of incorporating storytelling in regular visualizations to measure the audience’s reaction [129], or the decision-making capabilities [14], [15].
Overall, as mentioned in [73], all these criteria are subjective constructs, and thus, they depend on the context of application and cannot be measured directly. We argue that the goal of a visualization should be considered when evaluating the criteria.
5.4 RQ4: What are the current strategies to evaluate the quality of narrative visualizations?
The motivation behind this RQ was to find evaluation methodologies being consistent with the best practices and criteria found in previous questions, outside of the traditional laboratory studies.
A variety of approaches have been proposed along that way. Some approaches derive from the Human-Computer Interaction (HCI) field, such as heuristics evaluation (S36, S70, S71, S23). Other models address a specific criterion, such as comprehension (S31) or engagement (S61, S73), while others involve the use of algorithms (S58). There was only one method whose goal was to compare two different techniques and select the most appropriate one (S67).
Among the heuristic evaluation, there were a few authors that suggest the standard technique be supplemented with new details, as it is the case of the value-driven heuristics (S73) to assess the potential utility of a visualization, or [S23] that focuses on the “affective” impact of visualizations. Heuristic evaluation, however, has certain limitations, as it depends on evaluators’ background and domain knowledge and cannot always be applied due to their generality.
We found that many approaches do not explicitly mention the targeted visualization techniques, nor the goal of the visualizations they assess (decision-making support, or persuasion, for instance). Although we found several best practices in RQ2, we observed the evaluation strategies do not consider all of them, rather they focus only on a certain guideline or set of guidelines. For instance, the complexity score method in [S28] evaluates the aspect ratio of a chart, so that the user can perform tasks more efficiently, while S31 takes into account perceptual, cognitive and presentation aspects to assess comprehension.
We believe these methods might be classified into those that assess the visualization itself (S14, S23, S28, S31, S67, S36, S70, S71, S58, S86) and those that evaluate aspects concerning the user, such as literacy (S18, S59) or engagement (S61, S73).
In general, and in line with past research findings [130], the major obstacle to developers and designers of visualizations is the lack of out-of-the-box, ready to use evaluation tools. These methods, however, can serve as a starting point to other evaluation models.
5.5 Implications for research
The results of this SMS yield some opportunities for future research. First, more empirical studies are needed to test the efficacy of certain best practices. For instance, researchers can take a subset of the best practices, and observe the effects of including or excluding them in a decision-making context. We acknowledge that no single visualization can incorporate all thirty-eight best practices, thus a deeper understanding of the design space tradeoffs is needed to identify which of these best practices are necessary to reach a given goal.
Moreover, many of the criteria found in this study are subjective constructs and can be further examined and characterized in terms of their specific features: their formal definition in the context of visualization, or how to measure them appropriately, among other things. The relationship between best practices and evaluation criteria can also be observed.
Evaluation is perhaps the most challenging aspect since it involves the visualization itself and the user’s capabilities to interact with it and extract useful information. One limitation of past research is that they not always stated the goal of the visualizations they assessed, (a critical aspect to interpret results accordingly), or they only focused on a certain type of chart. As we mentioned the previous section, the most prevalent evaluation methodology are laboratory experiments and user studies that assess how well a visualization communicates facts. We hope the results of this study will assist researchers to go beyond this paradigm, to develop more contextualized, specific strategies.
5.6 Implications for practice
This SMS presented 38 visualization design best practices along with several recommendations on how to implement them. Designers and developers can follow these practices during the planning and design of visualizations or use them to compare to the practices they are currently adopting and identify improvement opportunities.
Additionally, by understanding the criteria for effective visualizations, engineers can determine their goals more clearly (i.e.: to make visualizations more memorable, more comprehensible, or more engaging, for instance) and make informed design choices towards that direction.
5.7 Threats to validity
This section discusses the limitations that may impact this study regarding construct, internal, external, and conclusion validity.
Construct Validity: Construct validity is determined by our ability to capture what we intended. During the search, primary studies could have been missed. We mitigated this threat by searching on different libraries that cover the majority of the high-quality publications in SE and complementing the search with forward and backward snowballing sampling [37]. In addition, we performed an updated search re-executing our search query to capture new papers published during the course of this research.
Internal Validity: These threats reflect possible wrong conclusions when causal relations are examined [131]. Researcher bias constitutes a threat to the internal validity. To reduce this threat, we performed the selection process iteratively. For the data extraction phase, we conducted a pilot extraction to validate the data extraction form. We had one researcher extracting the data and another reviewing the extraction. Any conflicts during this phase were discussed and resolved by the authors. To measure the level of agreement between researchers, we used the Cohen Kappa statistic [87].
External Validity: External validity refers to what extent it is possible to generalize the findings. To ensure the widest coverage possible, we included papers published from 1984 to 2021. The excluded papers may affect the generalizability of our results. However, we argue that they do not have a significant impact on our review, as the ones included share similar ideas and recommendations.
Conclusion Validity: Conclusion validity measures the reproducibility of the study. This threat was mitigated by following the protocol proposed by [74], widely used in SE research, to determine research questions, data sources and search strategy, inclusion and exclusion criteria, quality assessment, data extraction, and study selection.