The findings of our study which developed the new one-dimensional Pencil Pain Scale to evaluate pain in school age (6–12 age group) were discussed considering the relevant literature.
In a scale, the indicator of whether the scale is quantitatively and qualitatively sufficient to measure the behavior (feature) that is intended to be measured is the content validity. One of the logical ways to test content validity is to seek expert opinion . In content validity studies, which are also called logical or rational validity studies , a preliminary study must be performed by receiving opinions from sufficient number of experts to determine the ability of the scale items to measure the feature that is intended to be measured [20–27]. If the number of experts in preliminary studies is sufficient (between 5–40), the validity of the scale to be prepared will be high [28, 29].
As a result of the evaluation of the coded opinions of the seven experts on the suitability of the scale to children's characteristics, its ability to measure, its visuality, usefulness and applicability, the CVI of the Pencil Pain Scale was determined as + 1.00 (Table 1). With this result, the hypothesis that “The Pencil Pain Scale has content validity” was supported. The Lawshe (1975) technique is used to verify the content validity of the scale with numerical values and to evaluate the expert opinions properly. At least five experts are needed to use this technique 
Validity is a concept of how well the test measures an individual's desired trait. Validity is the degree to which a measurement tool correctly measures the feature it aims to measure without mixing it with any other feature . Although there are many criteria for testing the validity of a measurement tool, the most commonly used approach is to examine the similarity of the predictive validity of an adapted scale to the measurements obtained from the relevant and reliable scales (with confirmed psychometric properties) used in the same culture .
The validity of the Pencil Pain Scale was tested with convergent validity. To this end, the measurements made were compared with the mean scores of a similar study. The mean score of the Pencil Pain Scale was significantly lower in children who had intervention (1.31 ± 1.21) compared to children who had no intervention (2.12 ± 1.63) (p < .01) (Table 2). The difference between the groups was 1.81. In an experimental study in which children with similar age group and the same painful procedure had the distraction intervention with the same kaleidoscope, the FEES mean score was significantly lower in children who had intervention (3.14 ± 1.41) compared to children who had no intervention (3.80 ± 1.42) (p <. 01).
In this study, the difference between the groups was found to be .66  When the Pencil Pain Scale mean scores were compared with the FEES mean scores, similar results were obtained, and it was found that the Pencil Pain Scale had convergent validity. These results showed that the Pencil Pain Scale was able to measure the difference between the groups and the hypothesis that “The Pencil Pain Scale is a valid and reliable scale” was supported.
The research findings are similar to the results of previous research. In a study conducted to reduce the pain associated with blood collection in school-age children, the level of pain assessed by the VAS was 2.17 ± 2.25 . In another study  on the effect of the presence of parents during painful procedures and the effect of some factors on the pain tolerance of children aged 6–11, the pain level was determined as 3.50 ± 1.44. In another study  conducted to alleviate pain associated with blood collection in children aged 10–12, the level of pain was found as 2.30 ± 0.92.
Reliability is the ability of a measurement instrument to give precise and consistent results. In other words, it is the ability of the measurement instrument to produce replicable results [12, 33]. Various methods are used to find the reliability coefficient of a scale. In any scale development study, there is no easy response to the question of which should be used. Reliability test should be performed according to the possible objectivity in the scale to be used and the responses . In this study, the reliability of the scale was measured considering only sensitivity as the scale is a perception scale and the perception of pain changes according to time and situation. The distinction power of the scale was tested. The general pain scores of the upper 27% of the Pencil Pain Scale (x̄=4.368) were higher than the general pain scores of the lower 27% (x̄=.447). It was revealed that the difference between the upper 27% and lower 27% group scores was significant (t=-41.154, p = .000 < .001) (Table 3). With this result, it was determined that the Pencil Pain Scale was sensitive enough to make precise measurements and to distinguish the differences. The hypothesis that “The Pencil Pain Scale is a valid and reliable scale” was supported, and the hypothesis that “The Pencil Pain Scale is not a valid and reliable scale” was rejected.
One of the methods used to test the reliability of a scale is to divide the upper and lower 27% of the total scale score into groups and to determine the difference between the groups. The difference between the two groups is indicative of distinctiveness. The lack of difference between two groups indicates that the lowest and the highest score range is small. It is assumed that a scale measuring in a narrow range does not distinguish the differences [15, 34, 35].
It was stated that the pain scales had difficulties in distinguishing emotional states like pain and were criticized for these disadvantages; however, it was also stated that there is a very good correlation between the VAS and the FEES in assessing pain in children, and the emotional aspect is measured through facial expressions [36, 37]. In a study, researchers stated that they selected the facial expressions pain scale because there are clear and meaningful facial expressions in the scale. In another study, they preferred a cartoon-like scale . In addition, it is a known fact that in the assessment of pain in children, face scales show a high degree of correlation with other self-report measurement methods . Based on these findings, the Pencil Pain Scale developed in this study was used with the VAS and the FEES to evaluate pain.
In the literature, the positive correlation between two equivalent forms on the same subject is an indicator of consistency in terms of reliability . In order to determine the equivalent form reliability of the Pencil Pain Scale, the correlation between the three scales was examined, and no correlation was found. However, a comparison was made between the scale mean scores. In the study, the mean score of the Pencil Pain Scale was found to be 2.12 ± 1.63, while the mean scores of the VAS and the FEES were found as 2.32 ± 1.62 and 2.38 ± 1.77, respectively. No statistically significant difference was found between the mean scores of the Pencil Pain Scale and the VAS and the FEES mean scores (p > .05, Table 4). These findings indicate that the Pencil Pain Scale measures the pain in children in the same reliable way as the known and commonly used scales. Thus, the hypothesis that “There is no difference between the mean scores of the Pencil Pain Scale and the VAS and the FEES mean scores” was supported.
In the literature, the VAS is accepted as a practical and easy to understand scale for children aged five and older [40, 41]. The VAS has been successfully used with school-age children . In one study, a significant relationship was found between the VAS and the facial pain scales stating that there is a sensory component in children . The VAS pain mean scores were reported  to be positively correlated with the mean scores of other pain scales such as the Oucher [43, 44] Eland Color Scale , various facial scales  and the Comfort Scale  also, Hicks et al.  (2001) found a positive correlation between the VAS and the FEES pain intensity measurement levels in children between 5 and 12 years of age. Furthermore, the study emphasized the importance of using more than one scale to assess pain in school-age children .
In the study, it was found that the age of the children did not affect the level of pain (p > .05, Table 5). Although the pain responses of children change with age, it was emphasized that the intensity of pain is not related to age and that each child may react differently to pain due to their individual characteristics, even at the same age . Also, cultural characteristics can lead to a difference in children's perception of pain and their way of expressing it because children and their families may have cultural practices in coping with pain .
It has been reported in the literature that pain is experienced by young children as intensely as older children [50, 51]. It was further reported that the age of the child affects the pain perception and the response to pain. A child aged 0–1 perceives pain and responds to pain more differently than the child aged 1–3 and the adolescents . Similarly, the pain level of children in the 6–9 age group was found to be significantly higher than those in the 10–12 age group .
Another study found an inverse relationship between response to pain and age of children . Young children (4–6 years) reported greater pain in the same type of pain than older children (ages 7 and above). It was stated that as the age of the children increased, the perception of pain decreased and the pain responses were inversely correlated with age. [32, 54, 54]. It was emphasized that age is important for a child to cope with pain [32, 55]. This may be attributed to the increased experience of pain with age.
The study revealed that the gender of children did not affect the level of pain (p > .05, Table 5). Similarly, it was found that gender did not affect the intensity of pain during IV administration [32, 56]. In contrast, in other studies, it was found that girls perceived significantly more pain than boys and gender affected the perception of pain . It is stated that gender is important in pain experiences and that girls experienced more pain than boys in some procedure [57, 58]. It was reported that this difference between the genders may be caused by the cultural effect and it is generally culturally appreciated to show high tolerance to pain .
Generalizability and limitations of the study
The results of the study may be generalized to the findings related to the pain resulting from the blood collection procedure in school age (6–12 age group) children. The study derived from a thesis study, which was completed within a certain period of time. If children whose blood sample was taken only by one nurse had been included in the study, there would have been very few children matching the characteristics of the population. Moreover, as the research is a scale development study, high number of participants is important in terms of validity and reliability. Thus, it may be a limitation that blood collection was performed by three nurses. Another limitation of the study is that in similar scale reliability measurements, the scales were applied to different children with the same characteristics as it was thought that the scales did measurement with the same scoring and the children would be bored. In addition, it was thought that a correlation between the scales could not be determined for this reason. Failure to detect a difference between the mean scores of the scales may compensate for this limitation. In fact, considering the fact that the scales have the same scoring system and children may have a tendency to mark the same values on all scales, it was deemed appropriate to apply the scales to different groups with the same characteristics.