Methodology for heuristic evaluation of the accessibility of statistical charts for people with low vision and color vision deficiency

Statistical charts have an important role in conveying, clarifying and simplifying information and have a significant presence in fields such as education, scientific research or journalism. Despite numerous advances in the field of digital accessibility, charts are still a challenge for people with low vision and color vision deficiency (CVD) and create barriers that hinder their accessibility. The research presented in this paper aims is to create a heuristic set of indicators to evaluate the accessibility of statistical charts focusing on the needs of people with low vision and CVD. The set of heuristics presented has been developed based on the methodology by Quiñones et al. (Comput Stand Interfaces 59:109–129, 2018, https://doi.org/10.1016/j.csi.2018.03.002), which consists of 8 stages: (1) a state of the art literature review; (2 and 3) analysis and description of the most relevant information obtained from this research; (4, 5, and 6) selection and specification of a first set of heuristics relating them to existing heuristics; (7) validation; and (8) refining the set to obtain a final list of heuristics. A first set of heuristics (17 indicators) has been developed and applied on two heuristic evaluations, and has been amplified to 18 indicators. The final set covers the needs of the user profiles with low vision as well as the needs of the CVD and poor contrast sensitivity users. This research is a first step to widen accessibility requirements to statistical charts and to take into consideration users with low vision and CVD, often forgotten in accessibility research.


Introduction
Statistical charts improve the understanding of big volumes of data very efficiently and reduce the cognitive load associated with reading and digesting textual and tabular information [1]. Therefore, charts have an important role in conveying, clarifying and simplifying information, thus making information more accessible to everybody [2]. Charts have a significant presence in fields such as education, scientific research, journalism, marketing or business intelligence, which justifies the need to ensure access to this type of information to people with disabilities. Moreover, the open data movement and the open distribution of big datasets to citizens particularly by governments on their stride for transparency have had a big impact on the self-named "data journalism" increasing the number of charts and graphical presentations included in media with a surge of interest among journalists, academics, computer scientists and designers [3] on this type of content. On the business side, this democratization of data has led to the demand for new professional profiles trained in the analysis, management and visualization of data, to the point that in countries such as the USA, it is expected that by 2020 jobs related to data science will increase from 364,000 to 2,720,000 [4].
Although the last decade has seen many advances in the accessibility field, accessible visualizations for visually impaired users are still scarce. Hence, statistical charts currently present important barriers that hinder accessibility for visually impaired users.
Low vision is defined as the condition under which a person's vision could not be corrected completely with correcting lenses. Low vision means that, after the best optical lenses correction, a person has less than 0.3 visual acuity, or a visual field of less than 20°. Low vision difficulties may be classified under 5 different categories: visual acuity, relating to clarity of vision; light sensitivity, which can impede or hinder reading bright screens; contrast sensitivity, which affects distinguishing two colors with low luminance distance; field of vision, which may mean losing central vision, losing peripheral vision or a random field vision caused by occlusions or black spots; and color vision deficiency (CVD), most popularly known as color blindness, caused by the loss or degeneration of retinal cones, the perception organs responsible to detect the different color wavelengths affecting red (the most common, protanopia), green (deuteranopia) or blue (the most uncommon, tritanopia) or all of them [5]. Under this big umbrella we distinguish three severity levels: (B1) From no light perception in either eye up to scarce light perception, and an inability to recognize the shape of a hand at any distance or in any direction; (B2) From ability to recognize the shape of a hand up to visual acuity of 20/600 and/or a visual field of less than 5°; (B3) From visual acuity above 20/600 and up to visual acuity of 20/200 and/or a visual field of less than 20° and more than 5°. Finally, one can include also in this group people with different degrees of color blindness or with difficulties to detect contrast.
This diversity of profiles implies that each user's group will use different assistive technology ranging from screen readers and screen magnifiers to simple customizations, and each of them will also use different strategies-keyboard, mouse, zooming-to navigate through information, in order to profit from the residual vision they have or they will use alternative means such as voice. Screen reader users or magnifier users cannot be strictly differentiated as many people with low vision will combine different techniques and tools depending on the context, needs and preferences.
Moreover, this intrinsic complexity of low vision profiles is amplified with situational visual disabilities, where context conditions may affect every person, with or without low vision. A typical example of a disabling context is the level of ambient light [6], that can drastically reduce readability and, therefore, create situations very similar to intrinsic loss of vision. Another is excessively bright solar light, which may affect contrast perception on a screen and may reduce color differentiation [7], with perceptions very similar to those of CVD users. It is common for these disabling contexts to appear during the interaction with mobiles on real life scenarios; it is not too risky to assimilate mobile users with people with disabilities, and, in particular, with low vision users, as they share many similar barriers to access the information [8]. Therefore, solutions provided to low vision users can benefit a great number of users, taking into account the adoption of smartphones in our society, and the acquired sovereignty of mobiles as the main channel to access Internet [9].
A previous literature research by the same authors [10] unveiled an important lack of publications and guidelines focused on the accessibility of statistical charts for people with low vision and CVD. This identified gap adds to the existing marginality of a user group representing the 97% of people with visual disabilities [11] in the field of accessibility research. In order to address it and in the context of a bigger project to improve the accessibility of statistical charts for people with low vision and CVD, in this article the authors introduce a tool to evaluate the accessibility of statistical charts within a web-based environment with particular focus on the needs and barriers related to low vision and CVD.

Related research
The Web Content Accessibility Guidelines [12] are the reference document for digital accessibility, they have even been acknowledged as an ISO standard [13], and they have been adopted by many countries as the minimal legal requirement for public (and in some cases even private) websites to comply. In the case of Europe, the WCAG 2.1 has been integrated into the ISO Standard 301 549: Accessibility requirements suitable for public procurement of ICT products and services in Europe v2.1.2 [14] a reference standard determining the accessibility of websites and mobile applications of public sector organizations. WCAG are organized under four theoretical principles covering every aspect of accessibility: perceivable, operable, understandable and robust. Every principle is detailed in several specific guidelines, which in turn are translated to directly evaluable criteria under three levels of conformance.
WCAG, by definition, covers all types of web digital resources and types of information, and while it includes graphical content-with an improvement on contrast issues in its last version, WCAG 2.1-its general character means that they cannot thoroughly explore every aspect of statistical charts. Based on them, Boudreau [15] has recently created a list of general heuristics and Koivunen and McCathie-Nevile [16] created heuristics related to graphical content. For his part, Brajnik [17] collects a list of the main accessibility barriers for people with low vision and relates them to the success criteria of the WCAG 1.0 and 2.0, as well as to the Italian legislation.
On the other hand, independently of WCAG but also concerning accessibility, and addressed to content authors, there have been several initiatives to publish recommendations, guides or guidelines related to the authoring of accessible statistical charts. One relevant project is the National Center for Accessible Media (NCAM) guidelines [18], further developed by the Image description guidelines by the DIA-GRAM Center [19], which recommends some best practices for bar charts, line charts, pie charts and scatter plots, among others. These guidelines focus accessibility efforts on textual alternatives: accessible and equivalent tables or lists for complex charts, summaries of the content detailing the type of chart and main patterns for simpler charts, as well as promoting the use of clear titles, axes labels and legends.
With a more business-oriented vision and a more general scope, but also for authors, Evergreen and Emery [20] have created a data visualization checklist, 1 relying on design principles collected by the same authors [21], which covers aspects such as text, arrangement, color and lines of charts. This checklist has been rigorously tested by Sanjines [22].
Regarding the use of textual alternatives, though not oriented specifically to charts, but to a broader set of image types, the work of Splendiani [23] focuses on how to textually describe non-text content for scientific articles. Previously, the analysis of computer science journals conducted by Splendiani and Ribera [24] had already shown a deficit in the use of text alternatives, safe color combinations on the marks of the charts, an insufficient font size, or the use of images with a minimum resolution and dimensions. Simon et al. [25] show that the most common problem with charts and figures in the proceedings published by the Innovation and Technology in Computer Science Education (ITiCSE) are captions that do not adequately describe the figure and the use of font sizes too small to be readable.

Method
The research presented in this paper is based on the Heuristic Evaluation (HE) method, one of the most efficient usability evaluation techniques without users. Streamlined, the HE is a usability engineering method for finding the usability problems in a user interface design so that they can be attended to as part of an iterative design process. It involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the "heuristics") [26]. Two important aspects must be taken into account when using HE: first, the building of the list of principles and, second, the definition of severity ratings associated to each principle.
The HE general method has been adapted to the creation of a list of heuristic indicators to evaluate the accessibility of statistical charts considering the needs of low vision and CVD users, with the aim to be very clear and easy to apply.
Although there is no a clear agreement on the best suitable process or methodology to develop heuristics [27] within the literature, there are many proposals of sets of heuristic indicators to evaluate usability, user experience or accessibility in several specific fields, similar to the objectives of this article. These studies commonly do not explain the procedures to formulate, specify, validate or refine the list of heuristic indicators and in general are an extension or an adaptation from renowned and well adopted lists such as the ones created by Nielsen [28], Weiss [29], Perlman [30] or, relating accessibility, directly from WCAG. The initial lists do not dive into particular interface features nor in very specific products or applications, dealing instead with very general features, and they need an extension to cover other specific requirements. In the literature there exist also other publications dealing with the methodology of developing a list of heuristic indicators [31][32][33][34][35][36][37][38].
This research adopts the proposal by Quiñones et al. [38] of a formal and systematic methodology as the framework of reference and complements it with the metrics proposed by Jiménez et al. [37] to validate the efficiency of the proposed indicators compared to an existing heuristic list control. This methodology consists of eight stages: The list of heuristics obtained through this procedure could potentially be applicable to statistical charts in different formats or media such as printed documents, digital office documents, apps, etc. In this research, in order to avoid a not particular enough tool, the authors decided to focus the evaluation on charts published in the web, postponing the study of other media or formats to future work.
As mentioned above, the list of heuristics proposed in this work is intimately associated with WCAG, which are its starting point and inspiration. In this direction, for a chart to be accessible it is a previous requirement that the website containing it is accessible as well, i.e., the web has to comply with the success criteria established by WCAG. On the other hand, WCAG criteria covers some requirements affecting assistive technology used by low vision and CVD users, for example requiring to explicitly code the language of the page for screen readers; these requirements are not exclusive to statistical charts, but they are needed for those users accessing the chart content through a screen reader. For practical reasons the authors have decided not to include them in the list of heuristics, which has to be understood as complementary to WCAG.

Exploratory stage
As a first step, in order to obtain a very comprehensive list of indicators the authors carried out a thorough review of WCAG 2.1 and concomitant documents and tried to gather all criteria related to the subject of this work. This review focused its attention to success criteria of level A and AA, required by law in many countries, and ignored any criteria not related to charts or charts' interaction. The authors compiled a table including each relevant criteria together with a list of sufficient and advisory techniques associated to them, since they delimit and better define each success criteria, and a description of the implications of failure. The impact description was based on the information gathered from the document Understanding WCAG 2.1 [39] and from other cited works, and mainly from authors' previous experience working with low vision users in other projects (see Table 1).
As a second step the authors carried out a literature scope review about statistical charts accessibility for low vision and CVD users [10], in which they confirmed that previous research covered quite notably accessibility of charts for blind users but paid very low attention to the needs of users with low vision or CVD.
Solutions provided in the literature to promote the accessibility of charts include the creation of textual alternatives, haptic alternatives and sonification as strategies to convey chart information and even the combination of two or more of these strategies to offer multimodal access to the content. Relating textual alternatives, a wealth of research deals with methods to automate the creation of summaries or long descriptions of charts. Relating sonification, research has studied how sounds and vibration can convey, particularly to blind users, trends and quantitative or qualitative information. Finally, tactile alternatives are used from long ago, and rely on Braille and different kinds of embossing paper to generate accessible versions of charts, mainly for blind users.
Although the proposed alternatives for blind people could also benefit people with low vision and CVD, this user group keeps some visual faculties and wants to use them on their daily activities [41], which is not exploited in the above mentioned solutions. Taking into account this lack of studies for low vision needs, the initial literature research scope was redefined to include accessibility solutions for low vision, not necessarily related to statistical charts. A summary of the relevant information obtained during the exploratory stage is described in the following sections. It is organized within three categories: information, presentation and behavior.

Information
Within this category there are all the elements which help explain the chart content. Statistical charts include several characteristic components that confer them precision, order, clarity and communication ability [42]; these elements include the title, the axes, legends, symbols and labels. For the sake of comprehensiveness captions can also be included in this category as they include rich and explanatory content about the chart [43,44]. This is due to that they often contain the most important research results [45] or relevant data necessary for the understanding of that content. All these elements have an important role to help understanding the message and data communicated by the chart.
Quantitative axes show the range of numerical values of the displayed variable. Categorical axes show different values of the dataset, often resulting from some aggregation (countries, products lines, users' groups, etc.). Feria [42] recommends to never hide the axes or labels for aesthetic reasons. Often the axes extend into a chart grid that helps the user to visually identify the value of each mark. Feria recommends regulating the density of the grid looking for an equilibrium between clutter and informativeness. Labels of

Implications of failure for users with low vision
Implications of failure for users with CVD 1.1.1 Non-text Content (A) b G95: Providing short text alternatives that provide a brief description of the non-text content G73: Providing a long description in another location with a link to it that is immediately adjacent to the non-text content G74: Providing a long description in text near the non-text content, with a reference to the location of the long description in the short description G92: Providing long description for non-text content that serves the same purpose and presents the same information When a textual alternative is not provided, users with low vision that rely on a screen reader as a complement to other assistive technologies will not be able to access the information contained in the chart [10,16,17,23] When the chart uses not safe colors c and no textual alternative is provided for the color scheme, users with CVD will have difficulties to perceive the properties of data encoded through [7] 1. 3 Some users with CVD use a personal CSS on the browser to personalize colors. This will have no effect on image text [2] 1.4.11 Non-text Contrast (AA) Graphics with sufficient contrast G209: Provide sufficient contrast at the boundaries between adjoining colors G18: Ensuring that a contrast ratio of at least 4.5:1 exists between text (and images of text) and background behind the text G145: Ensuring that a contrast ratio of at least 3:1 exists between text (and images of text) and background behind the text G174: Providing a control with a sufficient contrast ratio that allows users to switch to a presentation that uses sufficient contrast When foreground and background contrast in both text and graphical elements is not enough many users could not be able to distinguish figure elements or to read content [5,16] When color is used to encode variables and the different values do not offer sufficient contrast users with CVD may not be able to distinguish them [5,16] 1. 4 At the end of every technique or recommendation the W3C identifier is included c Not safe colors: this term refers to color schemas that are not clearly distinguishable for users with deuteranopia, protanopia or tritanopia the axes, if correctly and clearly written, purvey the meaning of the chart by themselves beyond title or caption. For excellent comprehension it is recommended to include their units and precision level (e.g., thousands of gallons, millions of years) on the title of the axes. Legends are key to unequivocally interpret the encoding, making transparent the relation between numerical values with a color scheme, for example. Legends can be offered as an external, general key to all possible values, or inside the chart, close to each mark. Evergreen and Metzner [21] recommend to label data directly, close to marks (on top of or next to bars and next to lines), as they reduce cognitive load and foster a more efficient information processing. For Knaflic [46] data labels themselves can help to draw attention to certain data points, so they are useful if the data values are important. In this case it is possible to eliminate axes and instead label the data points directly to avoid the inclusion of redundant information. However, if users have to focus on big-picture trends, Knaflic recommends preserving the axes, to decide between both options authors may consider the required level of specificity. Additionally, comments, annotations or explanations enrich the chart with textual information providing context information.
Finally, image captions contribute a brief comment or explanation to the accompanying chart. Although many times they simply complement the title, their objective is not to repeat it but to contribute to a better understanding of the chart [47], including its purpose [48] when they include additional information. Several authors emphasize the importance of captions to understand a graphic [44,49,50], as it offers a synthesis of the most important aspects displayed in the chart, and its principal conclusions [45]. Splendiani [23] collected some recommended information to include in captions derived from style guides and scientific editorials, which for statistical charts consist of: identifying and explaining labels, abbreviations, data sources, usage rights and units and describing details of statistical analysis (standard deviation, p value, etc.).
The need to include alternative text and long descriptions has already been described in the section analyzing WCAG. In the case of charts, alternative texts cannot be used to describe the content (it requires longer explanations) and it just serves to briefly inform about the contents of the chart and to help users decide if they want more information about it [51].
In the literature, the most common approach to provide an accessible solution to a chart is to provide an alternative as a data table, especially when dealing with blind users [19]. As a drawback, tables do not show so efficiently trends, variable comparisons, and therefore require more short-term memory workload and a bigger cognitive load when trying to reach conclusions or insights from the data. In fact, one of the main benefits of displaying information through a chart is to make invisible information visible [52]. As an advantage, tables are sometimes a required complement; for example, to bitmap images where values are not readable by screen readers, in order to be able to lookup a specific value; W3C [51] includes data tables in its proposal for long description of statistical charts.
Long descriptions, on the other hand, enhance the chart understanding for users with low vision and other users' profiles that may not be able to understand the graphical content. Long descriptions could be offered on an external web page, within the alt attribute or just after the chart as part of the web page content [53], this last option being the one preferred by users [54]. Ault et al. [55] argue that a well written long description could serve, by itself, as an actual alternative rich enough to ensure a good level of accessibility for statistical charts. Corio and Lapalme [56,57] consider them a rich complement to understand the message communicated by the chart. In any case, as repeatedly argued in the article, visualization has some advantages never attainable with textual alternatives.

Presentation
This level includes all aspects related to the visual display of the chart: layout, text composition, typography, color use, among others.
Legibility is an important feature of content. According to Legge [58], a good legibility means that users can perceive text and to distinguish a character from another without ambiguities. The more distinguishable, the more legible. Previous research targeted to low vision users and elderly people, provides hints on how to display text to accommodate the needs of these users. Some of the mentioned features, directly affecting legibility, are font family, font size, contrast [59], text align and spacing. On the web context there are many mechanisms to personalize or customize these features.
In the literature there is no consensus on which font family is better for low vision users, although there is general agreement to recommend sans serif typographies such as Arial, Helvetica, Courier or Verdana, rather than serif font families such as Times New Roman [60]. All recommendations agree on not recommending decorative or fantasy font families. Organisms related to vision have even created specific fonts for low vision users, the American Printing House for the Blind (APH), developed APHont. 2 The Scientific Research Unit of the Royal National Institute of Blind People developed the Tiresias font. To the authors' knowledge there is only one study comparing the performance of Tiresias to more common font families, and it does not demonstrate better results [61]. The Laboratory of Cognitive Technologie, in Marseille, France, developed Eido, 3 focused on making letters simple and different as well, while being still recognizable by common readers. Bernard et al. [62] present an experiment where Eido is useful for people with central field of vision loss (such as macular degenerative) but with no effects in reading speed compared to Courier font family.
Also related to the font family and taking into account that statistical charts often display a big proportion of numbers in labels, legends or even titles, it is very important that numbers are very distinguishable between them and with letters. A typical example of confusing representations is the characters 1, l letter and lowercase i. This requirement may be more important than other considerations.
Left align and flush left promote a regular reading pace and a better legibility [63], while justified text may cause bigger difficulties, due to the white space "rivers" generated by irregular spacing [5]. This does not apply to number alignment in tables, where it is important for the same units to be aligned. Kerning, word spacing and line spacing [64,65] are also relevant, in particular to users with central field loss [66]. Regular letters are the most legible, and therefore it is not recommendable to include a great proportion of italics or capital letters [5]. Low vision users benefit from big font sizes, above 14 dots and preferably between 16 and 18 dots [60], and with the possibility to increase them through the browser options or through their assistive technology. Offering personalization is a good practice because it is the easiest way to satisfy the specific needs of every user, that cannot be identical to the established good practices [67]. For example, people with peripheral vision field loss but with a good vision acuity prefer using smaller font size, in order to read more text on their constrained field of view [5].
Column width may affect legibility for users with low vision. In particular, too long lines require an additional effort to users with reduced vision fields, as they have to make more horizontal and vertical movements [5]. In this direction, when a word is cut at the end of the line, it is better not to use syllabification as it derives on a higher difficulty to read and understand the text [5].
Elements layout must follow Gestalt principles. Related elements must be grouped, for example the title should be close to the chart it describes [5], and non-related elements must be separated by spaces and margins.
When the different textual elements (title, legends, captions, axes, labels, etc.) on screen are differentiated only by size and the user applies zooming, the originally bigger ones may appear too big and do not fit on screen. It is therefore recommended to combine several attributes to differentiate elements such as color, underlining or font family [5].
When the chart is provided as a bitmap authors must pay special attention to the file format. A file format is a standardized way to codify the digital information which defines the structure and the type of data stored on a file. Among the most popular file formats for graphics in the web it is worth mentioning JFIF (JPEG File Interchange Format), a lossy format standardized by the Joint Photographs Committee, and PNG (Portable Network Graphics), a lossless format, open standard, created by the W3C to substitute GIF, subject to some proprietary restrictions. JFIF is well known for its ability to display photographic images with a high level of compression, obtaining high quality pictures with very small size files; on its side, PNG has a better compression performance when used on images with big areas of uniform color (charts, icons, flags, etc.) with even higher compression rates than JPEG. PNG is also better when dealing with images combining both text and flat colors. JPEG is a lossy format, which means that, in order to reach a high compression, it loses some details of the original image that will be unrecoverable; this may result into a lower quality or may cause some "image artifacts" on big uniform areas. Consequently, if the original chart mainly uses flat colors and there is not a high demand of compression the recommended format is always PNG. Both formats accept different levels of compression to adjust quality, with less compression fostering fidelity and high compressing reducing file size. In summary, it is preferable to use lossless formats or if needed, use lossy formats but with a low level of compression which does not affect the output quality.
In bitmap images, quality is the result of file format, type and level of compression, as we have seen, but also of pixel size, resolution and bit depth [23]. In the context of this article it is important for source images to have a size big enough to be zoomed ×2 without losing clarity, this is without pixelation or blurring. Resolution is the number of pixels divided by longitude, indicated in pixels per inch (ppi) or dots per inch (dpi). Different screens may show the same graphic at different sizes because screens have their own resolution as well. Independently of the screen, when an image has a bigger resolution it can display a bigger quantity of differentiable details. Some recommended thresholds are 150 dpi for screen and 300 dpi for printing, but this does not consider the zoom requirements. These resolution demands depend also on the complexity of the chart, and the density of elements displayed in it; very simple charts with one or two big elements will go with lower resolutions. Bit depth is defined as the number of bits per pixel used to codify the color. Color may be codified from a lookup table with 8 bits (only 256 different colors) or as true color codifying each color channel (RGB) with 8 bits (255 values) plus, optionally, 8 more bits to codify transparency (alfa channel), resulting in a bit depth of 24 or 32 respectively.
For vector images, the format most widely used is SVG (Scalable Vector Graphics), a W3C recommendation based on XML. Vector images, as opposite to bitmaps, can be magnified as much as desired without losing quality, because they are not rendered by pixels but by instructions, independent of the size. SVG format offers far more accessibility options than bitmaps in many contexts [2]. To mention some of the benefits, it can be started by its standardization and integration on the web pages Document Object Model (DOM), which allows to manipulate and customize them as any other HTML element and makes them compatible with assistive technology including screen readers [68]; the separation between structure, content and presentation [69]; the possibility to use ARIA roles or attributes for every SVG element which may transmit very detailed information about attributes and values included in the chart in particular, W3C has developed some ARIA roles and attributes specific for statistical charts [70,71]. On the visualization landscape, SVG is used by D3 4 and its derivative libraries Vega 5 and Vega-Lite. 6 Color is an essential attribute of charts and plays a crucial role in statistical charts' accessibility. Beyond safe color and contrast issues, required by WCAG, scientific literature has studied color effectiveness to communicate statistical attributes. According to Ware [72], color is effective to differentiate qualitative variables, ordinal or nominal ones, but much less effective to communicate quantitative values. According to Mackinlay [73] when dealing with ordinal categories humans distinguish colors preferably by saturation (intensity of color) and then by hue (red, green, etc.). On the contrary, when dealing with nominal categories, hue is predominant.
Olson and Brewer [74,75] studied the use of colors in maps considering people with CVD needs. They researched several color schemes for sequential, divergent or categorical values and tested them under CVD conditions to offer some restricted color safe combinations. The term "color safe" refers to those colors that, combined, are distinguishable by people with CVD. To disseminate their work Brewer has created a free online tool, Color Brewer, 7 that let users choose a pertinent color schema under her own preferences (number of colors, type of values, color safe, etc.). Brewer schemes have been adopted by D3, Tableau and many other visualization tools.
Other authors, from the fields of optometry, HCI or information visualization, studied the utility of colors for specific tasks such as time series evaluation [76,77], or big data statistical judgement [78].
It is worth mentioning that safe color schemes do not consider contrast requirements set in WCAG, more severe, and focused on color luminance (difference to the ambient white light). Many users, with age, lose retinal rods and are unable to clearly perceive colors with low contrast. This requirement is set by success criteria 1.4.3 and 1.4.11 within WCAG 2.1, relating foreground and background text contrast and adjacent graphical elements respectively.

Behavior
Specific interface features related to chart interaction are described in this section.

Safe magnification
Although a chart provides typography with the correct font size, according to W3C guidelines and to abovementioned best practices, different vision acuity and field of vision coverage among low vision users may imply the need to further customize the zoom level or text size through web browser standard tools or through their own assistive tools. It is therefore paramount when using those additional resources for the content to still be viewable, and that elements do not overlap or shrink which would impede a correct view or reading [5].
Tooltips are short messages providing additional information on an element or a widget, fired when this element receives the cursor or mouse focus. Joyce [79], states some requirements for them to be accessible: (a) as by default tooltips are hidden, it is not advisable to include vital information in them; (b) restrict their use to situations where the given information is useful and concise; (c) use them consistently through all the charts; (d) make them compatible with mouse and keyboard as well; (e) use arrows, as in comic bubbles, to help users recognize which element does the tooltip refer to; (f) ensure a sufficient contrast; (g) avoid hiding or hindering other related elements with the tooltip-this last recommendation also contemplated by Van Achterberg [80].

Printing
Reading on screen may introduce additional difficulties for some low vision users. It is common for these users to read from a very short distance from the screen which means a very harsh position causing fatigue [5]. Taking this into account, it is recommended to offer the possibility to print the content, and ideally personalized to the user to cover their particular needs [5]. Often the printed version is just black and white, and the chart readability is compromised, as sometimes color is the only means to transmit a particular data feature; this adds the need to use colors wisely and to offer sufficient contrast.

Customization
Overall, and taking into account the high variability of profiles, needs and requirements of low vision users, customization of the chart perceptual components (colors, typography, size, alignment, etc.) should be a desired offering on an accessible chart. Customization could be offered by the chart creator, through the selection of different style sheets controlling font family, color schemes and font size, among others, through assistive technology, or through specific API or software libraries, as for example, Infusion by the Inclusive Design Research Center in Canada, 8 a plugin that any webmaster can add to their website which allows the readers to customize text size, font family, spacing or to apply high contrast colors. When customization is made through assistive technology, it is paramount that the code and scripts are standardized and tested for compatibility issues; when compliant, users would be able to access and manipulate content and styles from their own tool.

Real-time updates
Some charts depict very timedependent information, such as election results, sport results or stock numbers, and this information varies automatically along time. These changes must be communicated to the user, but the authors must try not to disturb or interrupt users too much or decide to delay communication until changes occurred affect a task in hand. In order to do so, the chart author must decide whether to interrupt the user or not, the level of data aggregation when communicating changes, etc.

Data export
Parallel to printing, exporting the chart to different formats is important to tackle different needs and profiles of the target audience. Some of the suitable formats are the raw data as an Excel or Comma Separated Value file, or the chart in PNG, JPG or SVG. These options will allow the user to read the chart with their preferred tool, or even generate a haptic version with an embossed printer from the SVG file.

Voice interaction
The possibility to interact with the chart by voice commands may benefit blind users, users with low vision and other profiles such as users with motor impairments. This implies that the chart is coded in actual text and all elements are correctly identified and described.

Sonification
Finally, much of previous research deals with "sonification" techniques, defined as an information representation based on sound, not including voice [81]. Some works explored mapping charts to musical sounds [82], vibrations [83], using sounds to communicate trends [84] or using volume, pitch and position to represent quan-titative and qualitative values [85]. Those options showed some good results for line charts or area charts, but they are not as useful for scatter plots [86]. Sonification is particularly suitable for blind users, while it does not fit very well users with low vision's needs except when they have very limited sight.

Experimental stage
The objective of this stage was to analyze data obtained from several previous experiments to retrieve additional information not identified during the first stage (exploratory stage). However, the authors did not find any previous research with a focus on statistical charts accessibility for users with low vision or users with CVD, therefore, knowing that this was an optional stage, this stage was skipped and the efforts were focused on later stages.

Descriptive stage
In this stage the focus was on selecting and prioritizing the most important questions identified within the information collected during stage 1 (exploratory stage) and 2 (experimental stage). In this sense, all aspects relevant for the creation of the heuristic indicators were collected and selected. Table 2 shows the information collected.

Correlational stage
In the correlational stage, Quiñones et al.'s methodology [38] was used to map features and functionalities of the heuristic evaluation domain with attributes coming from the usability and user experience fields, as well as with additional pre-existing heuristic indicators, in an attempt to reconcile domain features and functionalities with user experience and attributes related to them.
Several authors have already proposed to match accessibility and usability guidelines [87][88][89] mainly trying to relate WCAG and Nielsen's heuristics [28] concluding that there is a clear correlation between both. In this research, taking into account that the target domain is very restricted, the authors decided to try to match the indicators with Nielsen's heuristics, with the heuristics and principles proposed by Koivunen and McCathieNevile [16], Evergreen & Emery [20] and Boudreau [15], Brajnik [17] and also with WCAG 2.1 success criteria, the third being more versed on graphical content while the last being the reference document for web accessibility. There is also a match between the different identified aspects with the different levels of vision and with pathologies related to low vision, in order to ensure a coverage of different needs through the indicators. Finally, the resulting heuristics are grouped into 5 categories: good practices, textual alternatives, color and contrast, legibility, and additional features and functionalities.
The resulting list (Table 3) includes many indicators, specific to statistical charts and their elements (axes, legends, data source, etc.) which do not find a counterpart in any other guideline; this result was expected due to the specificity of the domain of our list, and the broader scope of Nielsen's heuristics, WCAGs or even Koivunen and McCathie-Neville [16] oriented to any type of chart.

Selection stage
In this stage, the objective was to review the list of indicators created up to this point and decide whether to keep, adapt or delete them. This procedure is summarized in Table 4: the "Action" column indicates how the indicator will proceed to the final selection, and whether it will be adapted to the specific domain, combined with similar or overlapping indicators, kept as an indicator per se or deleted as it is already covered by another indicator or it is not relevant enough. The column "Applicability" indicates how important that indicator is within the scope of this research, deriving importance from its capacity to solve accessibility problems related to statistical charts, and to cover needs of the different user profiles included in user with low vision group. The three levels of importance in increasing order are: useful, important and critical.
In order to reduce the resulting number of indicators and keep them below the 20-threshold recommended by Quiñones et al. [33], several indicators have been merged. First, all principles related to typography and text composition have been merged in a unique indicator covering all of them under the umbrella principle of legibility (H14-H20). Second, H3 relating to axes scales and H2, axes recommendations have been fused in H2. The requirement to offer additional data about statistical analysis is combined with the caption indicator (H6), as caption is the most suitable place to include this information. Alternatives as tables (H10) are merged with long descriptions (H12) following the W3C recommendation to include a data table within the long description. The indicator requiring offering an abstract of the chart (H13), has been deleted, as long description may fit      this need as well (H12). The indicator "text images" (H23) has been deleted and it will be taken into consideration for a higher or lower degree of compliance in all the heuristics related to textual elements. Contrast between foreground and background for text and graphical elements (H26 and H27) have been merged into one. The indicator H31 has been deleted because after reconsideration the authors saw that it is an interesting feature for blind people or people with very limited residual vision, but it is complicated to implement and does not have such a value for people with low vision. Finally, H32-H34 indicators, related to keyboard or tactile screens interaction, have been combined in a unique indicator analyzing the possibility to interact with the chart independently of the input method.

Specification stage
In the specification stage, the indicators obtained in the previous stages were formally defined. As a result of this definition, a total of 17 heuristic principles were obtained. Tables 5, 6 [38]. Every evaluator will proceed to run the individual heuristic analysis reviewing the applicability of each heuristic. Each indicator will be scored on a 7-point Likert scale (from 0 to 6) [90], where respondents will be asked how much they agree or disagree with a set of statements, following the compliance levels shown in Table 22.
Additionally, "Not Applicable" has been added to be used when the heuristic indicator is not suitable for the evaluated chart, as for example when a chart is black and white, and the heuristic is related to color hue.
In accordance with Pearce [91], there are substantial differences between the use of a more or less granular Likert scale in terms of number of statements. While a less granular scale allows for faster responses and clearer categories, it can also result in more bias or in frustration for the evaluators because their option is not represented on the scale. On the other hand, a highly granular Likert scale is more likely to have inclusive and exhaustive categories, and allows the collection of more precise data and more meaningful statistical results, with higher reliability and validity, and reduces the neutral and "uncertain" responses. However, with more granular scales, there is an associated complexity with the linguistic labels associated with each category, and the differentiation between them is not as clear. The psychometric literature suggests that having more scale points is better but that there is a diminishing return after around 11 points [92] and Sauro and Lewis [93] suggested that having seven points tends to be a good balance between having enough Features listed having "any user" profile are, in fact, good practices that benefit any reader of a statistical chart with or without a disability Combine with H15-H20 [12,15] (2) Important H15 Text is aligned to the left Combine with H14, H16-H20 [12,15,20] (2) Important H16 Leading will be at least 1.12 × font size and word spacing will be at least 1.16 × font size Combine with H14-H15, H17-H20 [12,15] (2) Important H17 Do not overuse italics or small caps Combine with H14-H16, H18-H20 [12] (2) Important H18 Ensure optimal line length for readability Combine with H14-H17, H19-H20 [12,17] (2) Important H19 Avoid syllabification Combine with H14-H18, H20 -(2) Important H20 White spaces and margins are sufficient to differentiate the different elements of the chart and for easy reading Combine with H14-H20 [12,15] (2) Important H21 After zooming the chart until 200%, using common web browser tools, the chart should be legible without horizontal scroll Adapt [12,17,28] (2) Important H22 Customization options for font size, color, contrast and other perceptual properties of the chart should be offered, or at least permitted Adapt [16,28] (2) Important H23 Images of text Eliminate --H24 Bitmap images should offer a sufficient quality level Create [12] (3) Critical H25 There are at maximum 5 safe colors to encode categorical or qualitative variables. Alternatively, the chart uses patterns or textures as encoding Adapt [12,[15][16][17]20] (3) Critical H26 Contrast between text and background has a ratio of at least 4.5:1 Combine with H27 [12,[15][16][17]20] (3) Critical points for discrimination without having to maintain too many response options. After scoring, heuristics are weighed depending on their impact or severity into 3 levels: low impact (1), average impact (2), high impact (3). To decide which weight each indicator would have, the authors relied on the "Applicability" column of Table 4, from the selection stage. For a detailed description of each level impact the reader may refer to Table 23. It is worth mentioning that these three levels are not related to the three levels of conformity within WCAG.
The score in the Likert scale is multiplied by the weight resulting in a weighted value, for every indicator. This obtained value is multiplied by 10. At the same time the maximum weighted value of the overall chart is calculated, taking into account that the maximum score for the "Not Combine with H26 [12,15,16,20] (3) Critical H28 When an element receives focus (either keyboard or mouse induced) its appearance changes and it is highlighted (changing its color, border, or other elements color or border) Adapt [12,15] (1) Useful H29 Users are able to print the chart from the standard options or with a specific plugin, and the printed version is optimized for this media Create [16,20] (1  This heuristic seeks to ensure that every chart includes a brief and descriptive title. Title will help users identify a chart among others appearing on the same page and thus will help them navigate through the charts Benefits User profile who benefits: any user Having a suitable title means offering a first approximation to the content of the chart and helps users to efficiently identify and recognize the information they are looking for when there are more than one chart on the same page Applicable" indicator is 0, and 6 for all the others. Then the maximum is used to divide the previous number.

Validation stage
The objective of the validation stage was to check the adequacy of the set of heuristic indicators taking into account their efficiency, via different experiments. Quiñones et al. [38] offer several options to validate the set of heuristics: validation through heuristic evaluation, validation through expert judgment and validation through user testing. In this research, the validation consisted of two experiments where four evaluators (the authors of the paper plus one experienced evaluator) assessed the charts using heuristic evaluation methodology. After compiling a list and brief description of each heuristic, three evaluators carried out the first round of evaluations of a sample of 9 HTML charts 9 appearing in the websites of two Catalan universities and in the Catalan University Quality Assurance Agency website [94] based on this list. This first experiment, which was more informal, acted like a training session on the methodology and helped the evaluators to become acquainted with the set of heuristics. After this experiment, the evaluators realized that assigning a score just by its value was too subjective and decided to enhance the description of every heuristic with a guide on how to assign scores. In this sense, the specification of each heuristic was complemented with a brief indication on how to meet it and with a checklist of aspects that should be observed during the evaluation. These new elements are detailed in the refinement stage (Table 24).
The second experiment, more formal, consisted of evaluating a sample of 35 statistical charts 10 published in the digital version of the 5 newspapers with the biggest audience User profile who benefits: any user Chart caption, far from duplicating information, helps understanding the message communicated by the chart, offers a synthesis of the most relevant data and may also offer some conclusion or additional information   Vanguardia), and in 2 international newspapers that are widely recognized as being good at information visualization (The New York Times and The Guardian) [95]. In this case, the analyzed sample contained both vector and bitmap charts. The selected newspapers were reviewed thoroughly during October 2019, and a sample of 5 charts was chosen from each one. Bar and line charts were the most common types of chart included; some pie charts were also included since, although their use is controversial, they are still a common chart, and the sample also contained a few variations of bar and line charts, which are also in widespread use. The exact This heuristic seeks to ensure a printed version of the chart, optimized for this medium, to those users who may prefer applying it Benefits User profile who benefits: any user Having an optimized version for printing, allows users with low vision, to print it and avoid having to consult the screen with forced positions near the screen  This heuristic seeks to ensure that every complex chart offers a textual long description giving the same information as the chart Benefits User profile who benefits: screen reader users   (14), line charts (13), pie charts (4), stacked bar charts (2), combined bar and line charts (1), and dot diagrams (1). The results of the evaluation of each statistical chart were recorded using a specific custom-made template, which automatically calculated the final score. Each evaluator was given an Excel template for the evaluation of each chart. The template included a screenshot of the chart and its URL on sheet 1; questionnaire for scores on sheet 2 that also included a field for the evaluator to briefly describe the problems associated with each indicator; the final score was automatically calculated on sheet 3. Each evaluator carried out their evaluation independently and a final meeting was held to review everything, particularly the discrepancies. Charts were evaluated only once, and the standard deviation between scores for the same charts were calculated; this was helpful for detecting special cases to discuss. When a discrepancy occurred (a deviation higher than 2), the evaluators went deeper into the specific criteria used for scoring. The researchers did not detect any learning effect because they did not impose any order on the evaluation nor did they record the order each evaluator followed. In future experiments, the authors may consider measuring the learning This heuristic seeks to ensure that all charts, and in particular bitmaps, offer a sufficient quality for a clear visualization and also support a zoom of at least 200% without blurring or pixelation Benefits User profile who benefits: any low vision user  This heuristic seeks to ensure that when an element of the chart receives the mouse or the keyboard focus, there appears some visible indication Benefits User profile who benefits: any low vision user effect together with the time taken for each evaluation in order to assess the practicality of the evaluation method.
The main accessibility problems detected by the evaluators were: the almost total lack of textual alternatives in the charts in bitmap format, a problem also found in SVG charts, in which standards such as WAI-ARIA were not used to label the marks (24 out of 35 bitmap and SVG charts); the use of insufficient or incorrect text alternatives when they were included (6 out of 11 charts with alt text); the poor use of visible indicators to highlight the elements that were the focus of the charts (9 out of 11 SVG charts); not supporting a keyboard interface (11 out of 11 SVG charts); an insufficient non-text contrast ratio in many of the analyzed charts (26 out of 35 charts); the use of a font size that was too small (35 out of 35 charts); and the non-systematization of the use of safe color palettes for people with CVD, which was a widespread practice, except in The New York times and The Guardian, which mainly complied with this requirement.
Based on these results, we can conclude that some of the problems detected by the evaluators in the experiments conducted during the validation stage were similar to the problems found by other authors (use of insufficient or incorrect text alternatives when they are included, the use of a font size that is too small and the non-systematization of the use of safe color palettes for people with CVD) [24,25]. Beyond these known problems, the set of heuristics proposed was not only practical and applicable to the domain, but also allowed the identification of additional problems not previously highlighted in the literature.
In this stage, the methodology was complemented and enriched by the calculation of several quality metrics proposed by Jiménez et al. [37]. Compared to the metrics proposed by Quiñones et al. [38], those of Jiménez et al. are more easily quantified and ready to use through their corresponding formulas, giving the validation a more solid basis. User profile who benefits: low vision and screen reader users. Also, for users with motor disabilities Users can navigate through the marks, data labels, legends, axes, although they cannot use devices such as mouses that require eye-hand coordination, have trouble finding or tracking a pointer indicator on screen or have tremors using a mouse User profile who benefits: low vision and CVD Customization is useful to ensure a good level of accessibility to a user profile as diverse as the low vision users  If the chart fails the heuristic, one or more user profiles will not have a satisfactory user experience with the chart, mildly compromising its accessibility If the chart succeeds at the heuristic the chart accessibility slightly improves × 1 If the chart fails the heuristic, one or more user profiles will have serious difficulties to perceive the chart information, severely compromising its accessibility If the chart succeeds at the heuristic the chart accessibility considerably improves × 2 If the chart fails the heuristic, one or more user profiles will not be able to perceive the chart information, totally compromising its accessibility This heuristic is key to provide access to the chart for one or more profiles × 3 Preferably near the caption, the source (institution and dataset), date and also a link to it are given Somewhere in the chart, preferably near the caption, there is information about the data source The data source identifies the institution and links to the dataset where the data come from The data source is actual text and not an image of text H7 Printing of the page containing the chart should be able to be visualized correctly. This is ensured through CSS styles A version for specific printing optimized is offered outside the browser's native print options Users can print the chart in a version specifically optimized for this medium The printed version has a good legibility The printed version has a good legibility, even when printed on black and white

H8
The most common way to reach the goal of this heuristic is using the "alt" attribute inside the <img> element. When the chart is coded in SVG, alternative texts can be written directly within the chart. Sometimes the attribute "aria-labelledby" can be used for this purpose The chart has an alternative text.
The alternative text is brief and descriptive The alternative text is not redundant with the title H9 Using the "longdesc" attribute, with a link to the long description which should be available, preferably, on the same page as the chart Offering a link near the chart with a link to the long description Giving the location of the long description within the alt attribute Using the figcaption element, to include both the chart and the description Giving the description just after the chart. This is the preferred option A correct long description gives an abstract of the message conveyed by the chart, a data table with the values present in the chart and information about the display (used marks, axes, encodings…) The long description provides detailed information about what is presented visually, including scales, values, relationships and trends The data values are provided through a data table The chart is structurally associated with the long description

H10
To reach the goal of this heuristic the color scheme should be safe for the different types of chromatic vision deficiencies, including achromatopsia (total absence of color vision) There is a maximum of 5 safe colors to differentiate qualitative, ordinal or quantitative variables Alternatively, values are differentiated by patterns or textures. The chart is seen correctly for protanopia, deuteranopia, tritanopia and achromatopsia profiles. This can be checked with a simulation tool such as NoCoffee c

H11
The visual presentation of text and images of text has a contrast ratio of at least 4.5:1 a The visual presentation of parts of graphics required to understand the context have a contrast ratio of at least 3:1 b Ensuring that a contrast ratio of at least 4.5:1 exists between text (and images of text) and background behind the text. Also ensuring that a contrast ratio of at least 3:1 exists between text (and images of text) and background behind the text. This can be checked with a tool such as Colour Contrast Analyser (CCA) d The most common approach to reach this goal is using CSS to change the presentation of the elements when receiving focus This criterium only applies to vectorial charts or charts implemented with JavaScript libraries, not to bitmaps, because in these no element is able to receive the focus On a vectorial chart when an element receives the focus of the keyboard or the mouse, the element is highlighted in some way

H16
All events or interactions available on the chart should be device independent This criterium only applies to vectorial charts or charts implemented with JavaScript libraries, not to bitmaps, because in these no element is able to receive the focus Jiménez et al. [37] suggested the use of several indicators to compare domain heuristics (d) with control heuristics (c). Domain heuristics refers to the final heuristics created during the process (Tables 5, 6 , 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 21) and control heuristics refers to the initial heuristics identified during the exploratory stage. In this research, 14 WCAG 2.1 relevant success criteria level A and AA were selected as the set of control heuristics ( Table 1).
Ratio of unique problems. The relation of unique problems (that is, the number of different problems found) identified by the new set of heuristics in comparison with the control heuristics. If the ratio is bigger than 1 the new set identifies more unique problems: Ratio of problem dispersion. The distribution of problems identified by each heuristic in the new set of heuristics in comparison with the control heuristics. If the ratio is bigger than 1 the new set has a more equilibrated distribution: Ratio of severity. The severity of problems identified with the new set of heuristics in comparison with the control heuristics. If the ratio is bigger than 1, the new set identifies more severe problems. For this ratio the researchers used the severity assigned to the new set of heuristics: (1) Useful, (2) Important and (3) Critical, and the severity assigned by WCAG: 3 (A), 2 (AA) and 1 (AAA): Ratio of specificity. The specificity of problems identified with the new set of heuristics in comparison with the control heuristics. If the ratio is bigger than 1 the new set identifies more specific problems. To apply this ratio, two levels of specificity were established: the first level is assigned to the problems that apply to any type of website or application, and the second for those problems that only apply to statistical charts: The results for the indicators in this sample after the second evaluation was conducted [95] 27 is the mean specificity of the problems detected by the domain heuristics and 1 is the mean specificity of the problems detected by the control heuristics. The ratio is bigger than 1 as expected, because in WCAG there are many heuristics applying to general problems.
The previous metrics were calculated by the researcher in charge of the statistical analysis (RA) during the validation process. 11 With these indicator values, we can conclude that the proposed heuristics identify more unique problems, the problems are better distributed, more severe and specific than in the control set, and therefore the new set of heuristics is much more suitable for evaluating the accessibility of statistical charts.

Refinement stage
The objective of the final stage (refinement) was to refine and verify the set of indicators based on the conclusions or comments resulting from the previous stage (validation). After the two experiments [94,95], a meeting was held to reflect on the methodology. As a conclusion of this meeting several descriptions were improved (Table 24), the researchers decided to use a shorter Likert scale (Table 25) as it was difficult to fine grain the score, and one indicator was added, i.e. H15 (Table 27), related to a new accessibility problem detected during the evaluations (see Sect. 3.8.3). A detailed explanation of the findings and decisions taken during this meeting follows.

Clarity of indicators
Some indicators were not entirely clear to all evaluators who had to conduct consultations during the evaluation process. For this reason, before the second experiment, additional information about each heuristic was added. For each heuristic principle, an additional checklist was developed with a list of items that must be checked (Table 24).

Ease of performing the heuristic evaluation
The evaluators stated that they were comfortable with the indicators and the proposed evaluation methodology, but some evaluators argued that a new level of compliance in the Likert scale to indicate that failing is not a problem was necessary, for example, when the abbreviations are not developed, but are commonly used and well known by the audience.
Furthermore, all the evaluators agreed about the difficulty of applying a 7-point Likert scale. In this sense, it was proposed to use a scale of 5 points (Table 25). Some evaluators also agreed on the usefulness of having a scale with reference values to assign the score of each indicator and the authors are now working on a guidance document including this. This is easily applicable to some indicators that evaluate objective problems such as image low resolution or bit depth, but somewhat more complex when it comes to assessing more subjective aspects such as the adequacy of a title or a text alternative.

Completeness of heuristics set
The evaluators agreed that the heuristics contemplated were sufficient to carry out the evaluation and to detect all the accessibility problems present in the analyzed graphics, with the exception of an identified accessibility problem that could not be associated with any of them. During the evaluation, there were several charts that showed watermarks or advertising banners on the image preventing the total or partial vision of the object. Both watermarks and banners are two resources present on numerous websites, so it is likely that charts from other thematic areas also present these problems. Accordingly, it was proposed to add a new indicator Problem Action H18 Without disturbing elements After evaluating the charts in the press, evaluators discovered a watermark for copyright purposes, and many ads hindering important information from the charts. Particularly over bitmap charts This is a bad practice. In the case of copyright, metadata should be used instead. In the case of ads, they should not compromise the perception of any important element in the chart Create a new heuristic Review if a watermark, banner ad or any other external element hinders the visibility of the chart associated with this problem. Table 26 and 27 show the process of specification of the new heuristic.

Lessons learned
In general, the feedback provided by the evaluators was positive. As previously discussed, the heuristic set seems sufficient to shed light on the accessibility problems of statistical charts.
One of the aspects that became apparent after the two experiments carried out in the validation stage is that charts in both bitmap and vector formats can present the same problems if they are not created following the principles of accessibility. That is, despite the initial advantage that vector charts have by not being images of text, or their compatibility with widely adopted standards such as WAI-ARIA, the reality is that, if these aspects are not taken care of, the resulting product will not be accessible. This was the case in the majority of charts analyzed, which were therefore not focused on meeting the needs of people with disabilities. In contrast, charts in bitmap format with a title, alternative texts, long descriptions and taking into account the necessary color and contrast requirements, among the other factors included in the proposed heuristic set, may be equally or even more accessible for people with low vision.
Although the additional information on each heuristic collected in Table 24 is considered useful, the authors believe that having a more detailed guide, with examples of scores, will unify the evaluation criteria for evaluators. Another aspect that may be useful for new evaluators who wish to use the set of indicators proposed in this work is the availability of some examples of evaluations carried out on a set of charts that present the most common accessibility problems of the domain under study.

Results
In this research, the authors introduced a new accessibility tool: a heuristic checklist complementary to WCAG and focused on the accessibility of statistical charts for people with low vision and people with CVD. The indicators included in this work, as well as the developed tool, are a contribution to the evaluation of charts accessibility with a special focus on people with low vision and people with CVD which tries to compensate a lack of research in guidelines, standards and even recommendations for low vision people, who still benefit from some residual sight.
As a result of the methodology developed by Quiñones et al. [38] a set of 18 heuristics has been created. It has been validated with an experimental evaluation of 35 statistical charts and with the validation method proposed by Jiménez et al. [37]. The evaluation with charts helped identify many problems and some good practices. The validation following the procedure by Jiménez was very positive and all indicators revealed that the proposed set is more effective and efficient than the control heuristics.
The heuristics cover the range of needs of the different low vision and CVD profiles (see Table 28), and also include general best practices that benefit any user.
The main result of this research is the list of heuristics obtained that is fully described in Sects. 3.6 Specification stage and slightly modified in Sect. 3.8 Refinement stage.

Discussion
Increasing efforts are being made to take into account the difficulties of different disability groups and providing them with accessible solutions, but there are some often forgotten groups such as cognitive impaired users or users with low vision. In particular, there is a lack of knowledge of low vision as a disability, both by institutions and large companies which causes this group not to be included when trying to address the barriers of access for people with visual disabilities. A clear example is the fact that the native screen magnification for Android was not available until the fourth version of the operating system, while the screen reader (Google TalkBack) was included since the first versions [96]. This lack of knowledge among society and technology is gradually being reduced with a better knowledge and more accessible solutions addressed to this user group. An example in the civil sphere is the campaign "I have low vision" by Begisare association, and by Association of affected Retinitis Pigmentosa of Gipuzkoa (Spain), or the incorporation to WCAG 2.1 of a new success criteria relating non-text contrast. Summing up, this research adds to a general trend to extend accessibility to wider user groups and to more domains like STEM disciplines [97].
The authors argue the need for specialized accessibility checklists to help not only accessibility experts or practitioners but also designers, content editors, or education managers to better create and evaluate STEM content, and in particular with this research, statistical charts. Heuristic checklists are not only useful for evaluation purposes, but they also serve as a guide for authors when creating new charts. Accordingly, the authors hope this tool also fulfils this additional mission and there will be an increment of accessible charts on the web. In comparison with the WCAG and the heuristic sets proposed by Boudreau [15] and Koivunen and McCathieNevile [16], the heuristics proposed in this work are more specific. The checklist proposed by Evergreen & Emery [20] is equally specific, but it is not aimed to evaluate web accessibility, but rather the "development of high-impact data visualizations". Other initiatives, such as the Diagram Center guidelines, focus on blind and severe low vision people, and pays very low attention to the needs of low vision and CVD users.

Limitations and future work
One of the most critical stages when creating a set of heuristics following the methodology by Quiñones et al. [38] is validation. Among the three different tasks proposed in their methodology, validation through heuristic evaluation, validation through expert judgment and validation through user test, this research only tried the first one. Two experimental evaluations helped the research team to add a new indicator and refine the Likert scale. The research team could themselves be considered experts as they have worked with accessibility for a long time, and one of them is also working in information visualization.
Relating to user test, although users have not been included in previous research work during the creation of a set of heuristics, an actual user centered design requires them to take an active role during the process. Moreover, the research by Power et al. [98] shows that only half of the problems encountered by users are covered by WCAG, and that in some cases, even after following WCAG recommendations and techniques, the problem is not yet solved. Additionally, Lechner's research [34] shows that users are very valuable on specific domains, as in the area we are discussed, because they contribute with a new perspective and identify problems that experts are not always able to detect. Authors are working to incorporate users within the process as an important future work direction, in order to further validate effectivity and efficiency of the tool, and to end up with a set of heuristics which covers better the real needs of users with low vision or CVD.
While the objective of heuristic evaluations is to identify the accessibility/usability problems in a user interface [26], rather than obtaining a final score, another future line of work would be to complement the score obtained using the Likert scale by calculating the severity of the identified problems. Severity ratings can be used to prioritize the resolution of certain problems before the publication of a chart or in the process of redesigning the interface. For Nielsen [99], severity is a combination of three factors: frequency, impact and persistence. Brajnik [17] suggested considering just two of these parameters when estimating the severity of a barrier: impact and persistence. Brajnik classified problems under three categories or grades of severity, i.e., minor, significant and critical, and proposed using a scale of 1 (mild case) to 3 (worst case) for both impact and persistence to obtain the severity value. In a recent study [100], the authors incorporated these metrics to complement the purely quantitative assessment with a more qualitative approach, with interesting results, although more thought and actual users are necessary to refine the method used to rate the severity of accessibility problems with reliability.
Finally, another future line of work is the creation of tools to help evaluators while scoring a chart, looking to reduce effort and maximize harmonization between different evaluations.

Conclusions
The research presented shows a proposal of 18 heuristic indicators for a quantitative evaluation of the accessibility of statistical charts in the web. The set does not pretend to substitute WCAG criteria and success points because it is not linked to conformity. Nevertheless, there is a complementarity between both tools. With both not only experts but also content writers, publishers, researchers and those in charge of procuring content, with no substantial accessibility expertise, would be able to design, evaluate or choose the most suitable statistical charts to incorporate to their publications or websites.
Heuristic evaluation is a well-known and widely used usability inspection method with numerous examples published in the scientific literature. Heuristics help experts to better understand which aspects of the interface may be problematic for accessibility, and also provide them information on how to solve these problems [101]. Additionally, another benefit of heuristics is their low cost and quick application.
Statistical charts are becoming part of widespread digital literacy and are already a basic type of everyday information. Although there exist numerous proposals of general heuristics targeting usability, user experience or accessibility, the specificity of statistical charts and the particular needs of users with low vision or CVD increases the need to adapt existing indicators, and even to create new ones, in order to cover all problems related to this domain. This research is a first step in this direction and will help to create better charts.