This study presents a psychometric evaluation of the Telenursing Interaction and Satisfaction Scale (TISS) intended to measure satisfaction with the interaction that takes place between telenurses and callers from the perspective of the caller. The analysis of data quality, factor structure, convergent validity, and reliability in a sample of 616 callers showed that the TISS holds satisfactory psychometric properties. To the best of our knowledge, this makes the TISS the first valid and reliable scale to measure caller satisfaction with interaction in telenursing.
The overall amount of missing data was acceptable and equally distributed among items with one exception; item 19, concerning opportunities to discuss alternative solutions to the health problem with the telenurse. The high frequency of missing data in this item might be explained by the callers’ perception that no relevant alternative existed. The item was judged as highly relevant in an earlier study  and adding a response option, “Not applicable,” might be appropriate in future refinement of the TISS. Despite this, the TISS demonstrated satisfactory data quality in terms of completeness of data, which was acceptable for all sub-scales.
All scales in the TISS produced non-normally distributed data with high degrees of satisfaction which is in line with other satisfaction studies in telenursing . Skewness could result in problems with sensitivity and responsiveness of measuring scales . The skewed result in this study could have several explanations such as non-response bias  or low expectations among callers resulting in high satisfaction rates . Reasons for skewness can also be found in the scale construction. According to Voutilainen et al. , items in a satisfaction questionnaire should be neutrally worded and sufficiently many to “increase the likelihood that the least satisfactory care components are also included” . This corresponds well with the 25 neutrally worded items in the TISS. Even though data in this study were skewed, every response option in the TISS was used indicating that the response scales are relevant. This is further supported by the result of previous cognitive interviews with callers who found all items and response scales relevant and comprehensible .
The hypothesized factor structure was confirmed by all fit indices except for the chi-square goodness-of-fit statistics. It is well known that the chi-square test is highly sensitive to large samples, and it is therefore commonly used for purposes other than examining model fit, e.g. to compare nested CFA models . Since all other goodness-of-fit indices indicated good to excellent model fit, the poor chi-square test can probably be explained by the large sample in this study. Consequently, the results support the TISS as a multidimensional scale that measures satisfaction with the four theoretical dimensions of interaction. A higher-order factor model was evaluated due to strong factor correlations. The goodness-of-fit indices of this latter model were somewhat impaired compared to the four-factor model, but still acceptable. It could be argued that correlations between the four factors were too high, and that the subscales could be questioned. Correlations of 0.85 according to Brown  are considered an acceptable upper limit of factor correlations, and models with higher values should be revised in order to reduce multi-collinearity between factors. However, Brown  also advises against re-specification without a theoretical rationale. In this study, the factors were theoretically driven, and there was no clear rationale for re-specification. Therefore, we recommend using the subscale scores instead of the total score. The total scale score can be used in situations where it can be a problem to use the four subscales, for example problems with multi-collinearity in regression analysis.
Internal consistency (ordinal alpha) was high for both the TISS subscales and the total scale, supporting the homogeneity among items. The ordinal alpha increases with increasing number of items, and according to Streiner et al. , high alpha values can be seen as an indication of redundancy among items. This was taken into account in the content validation of the TISQ, where eight items on interaction and with acceptable I-CVIs were deleted due to comments on redundancy . The remaining items were considered to measure separate aspects of interaction with the telenurse. Another indication of redundancy is correlated residuals , a problem that was not identified in this study. Thus, our findings support the hypothesis that the four dimensions of interaction have high content validity and internal consistency at the same time, as discussed by Keszei et al. .
Both scale and test-retest reliability were satisfactorily high. For two of the subscales, affective support and decisional control, the lower limit of the 95% CI of the composite reliability coefficient fell right below the critical value of 0.7 and may require further evaluation in future studies.
The results from this study, in addition to previous findings about the content validity of included items , support the use of the TISS as an outcome measure in clinical and research settings where the interaction between a caller and a telenurse is in focus. It was developed and evaluated in a context where telenurses provide nursing care without visual input and should therefore primarily be used in situations where visual input is not at hand. The profound relation between interaction and safety in telenursing [34–36] could make the TISS a useful instrument in patient safety work. It also responds well to individual telenurses’ call for individual feedback on work performance, as described by Wahlberg & Bjorkman , or for comparisons between different telenursing sites or subgroups in a population.
The TISS scale scores should be calculated in three steps: First, all item scores need to be reverse coded so that high scores reflect high levels of satisfaction. Second, sub-scale scores are calculated by summating all item responses within each sub-scale. This implies that the possible sub-scale score range is 8–33 for health information, 5–21 for professional technical competence, 9–38 for affective support, and 3–13 for decisional control. Third, since the sub-scales include different numbers of items, the scores should be linearly transformed into a 0–10 scale using the following formula: ((raw scale score – lowest possible score) / possible score range) × 10. The TISS total scale score range is 25–105 and can be linearly transformed in the same manner as the sub-scales.
The study has some limitations that need to be considered. First, no power calculation was conducted. However, the sample size of 616 callers is more than sufficient according to the rule-of-thumb for five or ten observations per estimated parameter. Additionally, WLSMV is less restrictive than other estimators for categorical indicator variables . The study also has some important strengths. The ordinal nature of data was considered in all statistical analyses in order to avoid underestimating covariances in the CFA . Another strength is that every caller who contacted the service during the inclusion period was invited to participate in the study. The sample showed satisfactory variation in age, sex, and other demographic variables. However, this variation is not a high priority in psychometric evaluation studies where sufficient sample size is considered more important . Further psychometric evaluation could include aspects of responsiveness and sensitivity , especially if change over time is the prime objective. Before use in other countries, the TISS must be properly translated and validated .