A definite robust validated weighted GDPR-based scale has been presented in this paper. The Benjumea privacy scale will allow developers to build good privacy policies from the point of view of GDPR. The aim of this paper is two-fold. First, to assess the robustness of a GDPR-based comprehensive scale. This objective is achieved by searching for consensus among a group of privacy experts. New items may be added to the scale after the experts’ suggestions. The scale may be used to evaluate the fairness of privacy policies in health apps. Second, to assign weights to each of the items included in the scale, based on the experts’ opinions. Delphi process was considered the most appropriate method for gathering information from experts about the relevance of the selected items and their importance when evaluating the fairness of the GDPR. After two rounds, the Delphi process was stopped according to consensus and stability criteria. A user’s guide, defining the use of new items and the items that have changed, is shown in Multimedia Appendix 3. Appendix 3 also shows how to calculate the final score for an assessed privacy policy.
Robustness
Quantifying the degree of consensus among the experts is an important component for performing a good Delphi data analysis and interpretation. In this study, we have used the interquartile interval (IQR) as a measure of the deviation of the opinion of an expert from the opinion of the whole panel [43, 44]. A suitable IQR-based criterion to determine that there is a consensus among the experts is that IQR value is equal or less than 1 for a 5-point Likert scale [45].
Based on the IQR values for each item on the scale (see Table 3), the robustness of most of the items is supported by the consensus of the group of experts. This is a clear indication that the expert panel agrees with the current requirements of GDPR. However, item I2, which was included in the initial version of the scale, did not achieve a high level of consensus. Thus, Wilcoxon signed rank test of stability was calculated for item I2, obtaining a p-value of 0.4689 and confirming the null hypothesis. Since no difference between rounds was found, Delphi was stopped. Item I2 is the only compulsory item with less of an 80% of ratings less than 4. As the item is a compulsory requirement of the GDPR, we propose not to exclude it from the scale, but to lower its importance in the evaluation of the GDPR fairness score, by assigning a low weight to it, as explained below.
As suggested by experts in Round 1, item I4, which deals with the purposes of the processing, was reworded. After checking experts’ opinions, we conclude that a privacy policy must contain a specific description of the purposes of the processing, and not a general one.
New items for the scale
During the first round of the Delphi process, the experts identified new items that may be relevant when assessing the fairness of the GDPR (see Table 2).
Among tentative items, there is a clear gap between T2 and T3 and the rest of the items. Most of the experts (more than 80%) assigned T2 and T3 a score of 4 or 5, which was our initial criteria to include new items into the scale. Besides, consensus was reached between the experts for these two items, considering IQR values. On the other hand, T4 to T9 are considered “important” or “very important” by less than 60% of responders, which is quite far from the objective of 80% of ratings equal or greater than 4. Although consensus is not achieved in some cases, they are far enough from the objective to be considered. Thus, items from T4 to T9 were discarded.
Weight Assignment
In the first version of the scale, all the items contributed equally to the score to assess the fairness of privacy policies. The scale considered that every item had the same importance when evaluating the fairness of the policy, regarding GDPR compliance. However, it is possible to consider that not all the measured items necessarily contribute with the same importance to the assessment of privacy policies. Thus, a weighted scale could be defined, assigning a weight to each item. The weighted score is calculated multiplying the weight by the corresponding individual value of the item.
Through the Delphi process, the expert panel has assigned a level of importance to each of the items on the scale. Therefore, it is possible to use this evaluation to assign weights to the items to reflect their impact of them on the score. Table 3 shows that all the original items have a median of 4 or 5. We propose a weight L1 = 1 for “very important” items (median 5), and a weight L2 = 0.5 for “important” items. Then, items I1, I3, I4, I5, I6, I7, I8, I10, I11, I14, and T2 were assigned a weight of 1, while I2, I9, I12, I13, and T3 were assigned a weight of 0.5.