Developing a Single-Item General Self-Efficacy Scale: An Initial Study

General self-efficacy represents the global sense of personal capability across various situations and tasks. The aim of the present study was to develop and validate a single-item general self-efficacy scale which balances practical demands and psychometric concerns. The psychometric properties of the proposed Single-Item General Self-Efficacy Scale (GSE-SI) were examined among 231 Singaporean adults. Results based on three statistical methods demonstrated good reliability (.594 .607 and .726, respectively, M = .642), as compared with the reliability scores from other single-item scales. It also showed satisfactory criterion-related validity evidence (i.e., correlation with a multiple-item general self-efficacy scale, r = .795). Validity evidence based on relationships with other constructs was supported by the correlations between the scores of general self-efficacy and the scores of six relevant constructs (i.e., positive correlations with life satisfaction and positive emotions, negative correlations with negative emotions, task and perceived stress, and illness symptoms). More importantly, the GSE-SI and multiple-item scale scores showed consistent correlation patterns with their relevant constructs. Both GSE-SI and multiple-item scale scores significantly discriminated between the three clusters in a similar pattern. The present results show that the GSE-SI is a reliable and valid measure of general self-efficacy and can be recommended in future research to complement the constraints of multiple-item scales.


Introduction
Research on the quality of life has gained popularity in the past few years and self-efficacy beliefs have been used as one of the key indicators to judge one's quality of life (Cullati et al., 2014(Cullati et al., , 2020. A great deal of research on self-efficacy has focused on specific domains or tasks, and found that task-specific self-efficacy has better predictive power (Eden, 2001;Feldman & Kubota, 2015;Schwoerer et al., 2005). However, on the dimensions of self-efficacy, Bandura (1977) stated that: "Efficacy expectations also differ in generality. Some experiences create circumscribed mastery expectations. Others instill a more generalized sense of efficacy that extends well beyond the specific treatment situation"(p. 85). Some researchers have also noted that self-efficacy can be conceptualized at a general level, as a generalized sense of self-efficacy (e.g., Lazić et al., 2021;Luszczynska, Gutiérrez-Doña, et al., 2005;Schwarzer & Jerusalem, 1995;Sherer et al., 1982). General self-efficacy (GSE) was conceptualized as a global sense of one's competency across various situations (Sherer et al., 1982). In recent decades, research on GSE has found that, compared with task-specific self-efficacy, GSE was often used to predict some general outcomes such as subjective wellbeing (e.g., Chung et al., 2017;Fatima & Suhail, 2019) and general performance in the workplace (e.g., Eden & Aviram, 1993;Khedhaouria et al., 2015). Furthermore, a strong sense of GSE contributes to a more positive life (Azizli et al., 2015;Luszczynska, Gutiérrez-Doña, et al., 2005) with higher achievement (Feldman & Kubota, 2015;Schwarzer, 2014), better health (Nützel et al., 2014;Wu et al., 2004), and social skills (Bandura, 1997;Pössel et al., 2005). At the same time, research has shown that general self-efficacy was not only correlated with many indicators of life quality such as life satisfaction (Capri et al., 2012), physical health (Haugland et al., 2016), and social life (Luszczynska, Gutiérrez-Doña, et al., 2005) but also widely used as an indicator to monitor and evaluate the effectiveness of intervention programmes in improving quality of life, such as stress and depression interventions (Blackburn & Owens, 2015;Feldstain et al., 2016;Pössel et al., 2005).
From the literature, the existing GSE scales are all multiple-item scales (Bosscher et al., 1997;Chen et al., 2001;Löve et al., 2012;Romppel et al., 2013;Schwarzer & Jerusalem, 1995;Sherer et al., 1982). These were often developed under different contexts. For example, Sherer et al. (1982) developed the 17-item GSE scale to provide an index of progress during psychotherapy. The 10-item scale from Schwarzer and Jerusalem (1995) provided an international and general GSE scale for a broad range of applications (e.g., Luszczynska, Scholz, & Schwarzer, 2005;Scholz et al., 2002). Chen et al. (2001)'s 8-item scale was developed to assess work motivation in organizational contexts. To provide a more economical, clinical assessment, Romppel et al. (2013) developed a shortened GSE scale comprised of six items. Among the existing GSE scales, those developed by Sherer et al. (1982), Schwarzer and Jerusalem (1995), and Chen et al. (2001) are the most widely used in GSE research. However, it should be noted that the length of the scale and response time are of major concerns, especially for research using repeated measures, such as longitudinal studies or intervention programmes. Some researchers argued that shortened scales comprised of a few items could circumvent these issues, such as the 6-item General Self-Efficacy Scale (Romppel et al., 2013). Regardless, any multiple-item scale has the potential to affect the willingness of respondents to participate in a study (Sarstedt & Wilczynski, 2009). On the other hand, it is more advantageous to use single-item scales rather than multiple-item scales. First, multiple-item scales tend to include redundant items (statistically speaking), and these are likely to cause compounded systematic errors leading to artificially high reliability (Yang & Green, 2011). Second, single-item scales have lower common method biases that are commonly found in selfreports (Gardner et al., 1998). Moreover, Williams et al. (1989) noted that the observed correlations between different constructs may be attributed to the same response format rather than item content, considering the common method bias caused by semantically similar items in multipleitem scales. Third, a single-item scale might be more suitable if the psychometric properties of single-item scales could be adjusted to a similar level as multiple-item scales. On these accounts, there is a trend among methodology researchers to develop single-item scales (Romppel et al., 2013;Sarstedt & Wilczynski, 2009).
In recent years, single-item scales have received great attention from quality of life and psychological researchers. For instance, DeSalvo et al. (2006) developed two self-rated singleitem scales to assess patients' general health, together with other general health single-item scales (Cullati et al., 2020;Jenkinson et al., 1994;Turner-Bowker et al., 2013). In addition, single-item scales for life satisfaction (Cheung & Lucas, 2014), quality of life (Yohannes et al., 2011), social identification (Postmes et al., 2013), and many other topics (e.g., physical activities, depressive and somatization symptoms, and emotional exhaustion) have also been proposed and widely used in related research (Hart et al., 2012;Milton et al., 2011;West et al., 2012).
However, as one of the most widely used constructs in psychological research, a single-item GSE scale has not been developed. On top of this, the approaches to developing single-item scales are diverse and lack unanimity among researchers engaged in single-item research. The purpose of the current study is to develop and validate a single-item general self-efficacy scale, by comparing and integrating different methods of single-item scale development.

Method of Developing Single-Item Scales
Item Selection. In prior single-item scale studies, the approach to selecting the particular item has not received a broad consensus. Most researchers have decided on a particular item based on expert judgements recommended by Rossiter (2002). However, experts can hardly be expected to select the best item due to instability and inconsistency. The inconsistency among experts may be due to the different bases of assessments that they had relied upon when judging items (Diamantopoulos, 2005). For instance, some experts may focus on the connotative meaning of an item whereas others may consider the denotative meaning. In addition, some statistical methods have also been suggested to select an item for scales (e.g., Loo, 2002;Sarstedt et al., 2016). In their reviews of single-item scales, Sarstedt and Wilczynski (2009) noted that item-to-total correlations, factor loading, and squared multiple correlations were used in the selection of single items in the development process. They found that the lack of broad consensus on the best method for selecting the single item and cross validations using multiple methods has resulted in the justifications of the final item being weak and unconvincing. The first purpose of the current study is to select a single-item scale using both expert judgement and statistical methods, approaches introduced by Loo (2002), Wilczynski (2009), andSarstedt et al. (2016).
Reliability. Because internal consistency reliability (i.e., Cronbach's alpha) cannot be conducted for a single item, other statistical methods to examine the reliability of a single-item scale have been proposed. First, Wanous and Reichers (1996) introduced a method to utilize the correction for attenuation formula to estimate the minimum level of reliability of a single-item scale and this has been used by other researchers (Postmes et al., 2013). Second, Weiss (1983) proposed that exploratory factor analysis (EFA) could also be used to estimate the reliability of a single-item scale. For example, Wanous and Hudy (2001) estimated the reliability of scores obtained from a single-item teaching effectiveness scale with two extraction methods in EFA, principal axis factoring (PAF) and maximum likelihood (ML) factor extraction methods and found identical results in their studies.
However, there is no broad consensus in the best way to estimate reliability of scores obtained from the single-item scale. Moreover, it can be noted that previous studies had rarely employed sufficiently diverse methods to evaluate the reliability of scores from single-item scales despite their potential statistical disadvantages (Hoeppner et al., 2011). Thus, the second purpose of the current study is to evaluate and compare the reliability scores from single-item scales based on multiple approaches. In the current study, the reliability will be estimated by both the correction for attenuation formula and EFA (PAF and ML).

Validity Evidence Regarding Relationships with Criterion Variables and Conceptually Related
Constructs. Previous validation of single-item studies has mainly focused on the relationships between the single-item scale and criterion variables, and conceptually related constructs. For the former, the correlation between a single-item and the corresponding multiple-item scale was often used to provide validity evidence (Milton et al., 2011). In the latter, validity was mainly supported by the correlations between the target construct and theoretically relevant variables (Cheung & Lucas, 2014). In the present study, the third purpose was to evaluate evidence of validity between the single-item and multiple-item general self-efficacy scales and between these scales and scales of conceptually related constructs. Specifically, Pearson's correlations will be calculated between two GSE scales scores (single-item and multiple-item), as well as the GSE and six theoretically related constructs. A large body of findings has reported significant correlations between GSE and life satisfaction, mental well-being, stress, and illness symptoms (Andersson et al., 2014;Azizli et al., 2015;Luszczynska, Gutiérrez-Doña, et al., 2005). Therefore, in the present study, the scores from scales measuring life satisfaction, positive emotions, negative emotions, perceived stress, task stress, and illness symptoms will be assessed.
Enhancing the Validity of Single-Item Scales. The use of single-item scales has been challenged because single-item validity evidence is often regarded as less persuasive than that of multipleitem scales. In addition, it should be noted that, considering the effect of measurement errors, the observed correlations of single-item scales could be biased. That is, the validity evidence of singleitem scales could be underestimated, especially their relationships with criterion variables and conceptually related constructs. Regarding this problem, Pedhazur (1997) has proposed a method to adjust single-item correlations. When the reliability scores are applied into the correction for attenuation formula, the single-item correlations could be enhanced and the validity evidence of single-item scales could be strengthened accordingly. Hence, the fourth purpose of the present study is to examine the adjusted correlations of the proposed scale with relevant constructs, providing a more valid use of the single-item scale.
Discriminative Power. There is a lack of information on the discriminative power in previous singleitem scale studies even though it is a vital indicator of scale validity. In the few single-item studies that reported the discriminative power analysis, demographic variables (e.g., gender, age) were mainly used (Elo et al., 2003) instead of psychological variables. However, from studies on life quality, the discrimination across psychological variables could be more meaningful than demographics. Therefore, the fifth purpose of present study is to evaluate the discriminative power of the proposed single-item scale. Specifically, respondents will be grouped using cluster analysis based on six relevant psychological constructs, followed by conducting a one-way ANOVA across different respondent groups.
Significance of the Present Study. The overall aim of the present study is to develop a single-item GSE scale by conducting various selection approaches and sufficient reliability and validity analyses. There are two major contributions of the current study. First, the current study will significantly contribute to scale development and GSE research by providing an alternative choice for researchers (single-item vs. multiple-item scales), especially when length of the scale and time are crucial factors in a study. The proposed single-item general self-efficacy scale has also the potential to provide a psychometrically sound tool to address the practical constraints and measurement concerns in applied research such as correlational and intervention studies in the quality of life.
Second, because single-item scales are commonly considered as less reliable and valid than multiple-item scales, the present study will also provide persuasive empirical evidence in support of the reliability and validity evidence of single-item scales, as well as suggestions on how to improve its predictive power with an aim to approximate the effects produced by multiple-item scales in empirical research.

Sample and Procedures
The current sample comprises 231 adult participants (172 females, 59 males) recruited in Singapore. The age distribution is as follows: 16.5% between 21 and 25, 32.9% between 26 and 30, 17.3% between 31 and 35, 10.4% between 36 and 40, 11.7% between 41 and 45, 8.7% between 46 and 50, 2.6% more than 50. Anonymous surveys with the same content were administered to the participants through either paper-based or online surveys. A convenience sampling method was used to recruit 186 adults who completed paper-based questionnaires, administered under the supervision of a research team member, who offered further explanations and ensured confidentiality of the participants. Additionally, snowball sampling was utilized to recruit 45 participants who responded to an online questionnaire developed using the Qualtrics software. An independent-samples t-test was conducted and no significant difference was found between the two survey samples. English was the language used in all survey forms.

Measures
General Self-Efficacy. A shortened version of the New General Self-Efficacy Scale (NGSE) developed by Chen et al. (2001) was adapted and used in the current study. It included seven items and was labelled as NGSE-7. The coefficient alpha of the NGSE-7 scores was .870 for this study sample. Participants responded to all items on a 5-point rating scale from 1 (Not true of me) to 5 (Very true of me).

Theoretically Relevant Variables
Life Satisfaction, Task Stress, and Illness Symptoms. The 5-item Life Satisfaction (LS) scale, 9item Task Stress (TS) scale, and 4-item Illness Symptoms (IS) scale developed by Pettegrew and Wolf (1982) to measure teacher stress were adapted for this study, with several item modifications made to apply to the general population (e.g., "Trying to provide a good education in an atmosphere of decreasing financial support is very stressful" to "Trying to get a good work outcome is very stressful"). The internal consistency reliabilities of the three scales were comparatively high in the present sample (α = .788, .835, .732, respectively). To maintain consistency throughout the study survey, the response format was simplified from the original 6-point to a 5-point Likert scale ranging from 1 (Strongly Disagree) to 5 (Strongly Agree).
Perceived Stress. This construct was measured with the 14-item Perceived Stress (PS) scale developed by Cohen et al. (1983) to assess the PS level of community samples. Two of the items were slightly reworded to fit more general situations. The coefficient alpha for the PS scale was .819 in this sample. To align with the overall rating method in this study, items were scored on a 5point frequency rating scale, ranging from 1 (Never) to 5 (Always). The PS scale scores were obtained by summing each item score after reversing the score of the seven positive items.
Positive and Negative Emotions. Both positive emotions (PEM) and negative emotions (NEM) were measured by revised scales based on the Scale of Positive and Negative Experience (SPANE) developed by Diener et al. (2009). In this study, two items (happy and good) were retained from the SPANE and two other items (cheerful and excited) were added to the PEM scale. The NEM scale consists of six items, including two items (angry and sad) derived from SPANE and four reworded items (frustrated, lonely, disrespected, and miserable). The coefficient alpha of PEM and NEM scale scores were .898 and .812, respectively. All items were responded to on the same 5point scale from 1 (Never) to 5 (Always).

Selection of Single Item
The particular item selected from the NGSE-7 was based on the following five approaches (Sarstedt et al., 2016): (a) expert judgement, (b) the highest item-to-total correlations (ITC), (c) the highest factor loading extracted from PAF, (d) the highest factor loading extracted from confirmatory factor analysis (CFA), and (e) the highest squared multiple correlations (SMC) in CFA. The item with the highest values in the analyses was selected as the single-item scale.
Nine scholars with earned doctorates in psychology employed at a research-intensive university provided their comments on each item of the NGSE-7. Experts' views were not consistent, thus their views had limited value in the selection of the single item. Therefore, four statistical methods were used, and the results are presented in Table 1. Four different statistical methods converged on one particular item, pointing out that Item four had explained the construct of the GSE the most. Only one factor was extracted in the EFA performed, with an eigenvalue of 3.460, accounting for 49.43% of total variance and a one-factor congeneric CFA model was applied. Item four was reported with the highest ITC, factor loading, and SMC among all seven items. Furthermore, Cronbach's alpha of the whole scale decreased from .870 to .844 if Item four was deleted, less than the value of any other item. Overall, the results from four statistical methods showed clear and relatively constant ranks for all items and Item four was selected as the most representative NGSE-7 item. Consequently, based on the item selection and cross-validations across five different methods, Item 4 ("I believe I can succeed at most of any endeavour to which I set my mind") was finally selected to be the Single-Item General Self-Efficacy Scale (GSE-SI). Note. ITC: item-to-total correlation; PAF: principal axis factoring; CFA: confirmatory factor analysis; SMC: squared multiple correlation.
The results obtained from three different reliability tests: 1) correction for attenuation formula, 2) PAF in the EFA, and 3) ML in the EFA are listed in Table 2. Because the two variables in the current study come from the same conceptual domain (GSE), the estimated reliability of scores obtained from GSE-SI was .726 based on the following formula 1 (Nunnally, 1978, p. 220): r xy ¼ ffiffiffiffiffiffiffiffiffiffi r xx r yy p Based on the communality obtained from the EFA (Harman, 1976, pp. 16-19), the reliability was at least .594 (PAF) and .607 (ML), respectively. Overall, the reliability values ranged from .594 to .726, with a mean of .642, indicating a satisfactory reliability for the GSE-SI.

Validity
Evidence Based on Relationships with Criteria and Conceptually Related Constructs. Criterion-related validity evidence was assessed by calculating the Pearson correlation between the GSE-SI and NGSE-7 scores. The results showed a strong positive correlation of .795 (corrected item-total correlation = .701), indicating that the responses to the GSE-SI were highly correlated with those to the NGSE-7.
Relationships with conceptually related constructs were evaluated by comparing Pearson correlations between the scores of the two GSE scales (GSE-SI and NGSE-7) and six theoretically related variables. As Table 3 shows, the patterns of correlations with the other six constructs were the same across the two GSE scales. Positive emotions were positively correlated with both GSE-SI and NGSE-7 scores. Similarly, positive correlations between life satisfaction and GSE-SI and NGSE-7 scores were obtained but were comparatively lower. The scores for negative emotions and illness symptoms were negatively correlated with those of both GSE scales with little difference. Besides, perceived stress scores showed the strongest negative correlations with the two GSE scores, and task stress showed the same correlation pattern with slightly lower coefficients.
To further test the validity evidence of the GSE-SI, variance reductions were calculated based on the correlations for the two GSE scores. When r 2 for the GSE-SI were subtracted from those for the NGSE-7, the average reduction was .055, ranging from .022 to .115 (see Table 4).
Adjusting with Reliability: More Valid Use. To gain more confidence, the correlation coefficients between GSE-SI and six theoretically relevant variables scores were corrected with its reliability based on the correction for attenuation formula (Pedhazur, 1997, p. 172), with three estimated and the mean reliability of GSE-SI applied (Table 3). Based on the corrected correlations, the variance reductions decreased in all four correction scenarios, comparing to uncorrected results (Table 4). Specifically, the average variance reductions calculated by three estimated reliabilities were .013, À.010 and À.008, respectively. Notably, when applying the mean reliability, the reduction in the average variance was only À.001. In addition to the means, standard deviations based on four reliabilities were almost the same (.025) and lower than original values. Overall, the slight differences in correlations indicated that the GSE-SI and the NGSE-7 scores shared similar correlations with the six theoretically relevant constructs.
Discriminative Power. Two steps were conducted to classify the participants based on six theoretically relevant constructs. First, a hierarchical cluster analysis was utilized to explore the optimal number of clusters. In this step, Ward's method and the squared Euclidean distance were selected (Chue & Nie, 2017), while, participants were allocated into different clusters. The results showed that the first huge gap in the coefficient values from the agglomeration schedule indicated that the significant clusters occurred at stage 228, suggesting three clusters as the optimum solution. Second, a k-means cluster analysis was conducted using the means of three previous clusters as the initial cluster centers. Subsequently, the participants were relocated into the final three clusters. Importantly, a cross validation procedure was repeated five times to confirm the validity of this 3-cluster solution. The  The mean reliability (.642) was applied sample was randomly divided into two groups (115 and 116 participants, respectively). Two similar steps were conducted on the two groups except that the initial centers for two groups in the k-means cluster analysis had used the cluster means based on one group. Subsequently, Cohen's κ values were calculated to test the consistency between the new and original clusters. The mean Cohen's κ was .824 across five cross validations, verifying the stable 3-cluster solution (Landis & Koch, 1977). Descriptive statistics and a one-way ANOVA were used to investigate the differences across clusters for each of the six related constructs. The results are presented in Table 5. Homogeneity of variance could be assumed only in illness symptoms, in which the Bonferroni post hoc tests were conducted. Where homogeneity could not be assumed, the Brown-Forsythe F ratio was reported and Games-Howell post hoc tests were conducted. The ANOVA results revealed significant differences between the three clusters across all six relevant constructs (p < .001). A post-hoc test showed that all comparisons were significant (p < .01). Thus, based on the score patterns, the three clusters were named negative (high on four negative constructs, low on two positive constructs), balanced (no obvious tendency), and positive (high on two positive constructs, low on four negative constructs). A one-way ANOVA was conducted to assess the discriminative power of the GSE-SI across three clusters. As expected, GSE scores measured by the two GSE scales both increased from the negative group to the positive group. As Table 6 shows, both GSE-SI and NGSE-7 scores significantly discriminated between the three clusters in a similar pattern (F (2, 228) = 9.127 and 21.122, respectively, p < .001).

Discussion
The present study developed a single-item scale of general self-efficacy (GSE-SI) based on various approaches. In addition, the reliability and validity evidence of the proposed scale was assessed.

Item Selection
In single-item scale types of research, there is no broad consensus for the best method to select the single item and expert judgement is often used. However, the current study found that experts could only provide information with limited value. Thus, researchers have to consider other approaches such as statistical methods in the decision-making process to select the single item. This is consistent with Sarstedt et al. (2016) who noted that experts could hardly be expected to select the best item due to their inconsistencies in judgement. Furthermore, the authors noted that the statistical methods used in selecting the final item had yielded consistent results. The current findings are aligned with a previous study that found the four statistical selection processes had converged to identify one particular item (Sarstedt et al., 2016). In summary, decisions on single item selections should be based on multiple approaches and more comprehensive information such as the consistency of results obtained from different methods, instead of relying solely on expert judgment.

Reliability
Several methods have been proposed in this study to estimate reliability. However, the authors did not find evidence in support of one best method. Therefore, the mean reliability score can be considered as a better estimate of reliability when differences appear across several estimation methods. Despite the notion that reliabilities of scores obtained from single-item scales may be inferior to those of multiple-item scales, the reliability of scores obtained from GSE-SI in the present study has been shown to be satisfactory when compared to existing multiple-item GSE scales (.70-.90, .75-.91, .85-.90;Chen et al., 2001;Schwarzer & Jerusalem, 1995;Sherer et al., 1982). In comparison to other empirically supported single-item scales, Postmes et al. (2013) found the average estimate of reliability based on single-item scales in a meta-analysis to be .51, ranging from .14 to .68 (e.g., single-item perceived stress scale; Littman et al., 2006 and singleitem burnout scale; Rohland et al., 2004), suggesting that the reliability of the GSE-SI proposed in this study to be substantial (mean reliability = .642). Moreover, the "true" reliability of scores obtained from GSE-SI is possibly higher since both methods had produced the minimum estimate of reliability. Thus, the results of the present study indicate that scores obtained for the GSE-SI can be regarded as reliable.

Validity Evidence Based on Relationships with Criterion Variables and Conceptually Related Constructs
In terms of the test-criteria relationships of single-item scales, moderate correlations between homogeneous single-item scales and multiple-item scales have been commonly reported (Konrath et al., 2014;Milton et al., 2011). Therefore, based on the present findings, the validity evidence of the proposed GSE-SI in this study can be regarded as high. Regarding validity evidence based on relationships with conceptually related constructs, the correlations between scores of single-item scales and theoretically relevant constructs were supported. In the current study, GSE-SI scores also showed similar and consistent correlation patterns to the NGSE-7 scores with the six relevant constructs. However, it should be noted that the magnitude of the correlations between the scores of GSE-SI and the theoretically relevant variables were lower than those of the NGSE-7.

The Importance of Using Reliability to Adjust Correlations
To gain more valid uses of single-item scales, some researchers have suggested that single item correlations should be corrected by its reliability using the correction for attenuation formula (Pedhazur, 1997), a method used by Cheung and Lucas (2014) to support the validity of their single-item scale. In the present study, the correlations between the scores of theoretically relevant constructs and the GSE-SI were comparable to the correlations between those of the theoretically relevant constructs and a well-established multiple-item scale after correcting with the reliability. Moreover, the corrected variance differences and standard deviations between two GSE scales scores were also calculated. In the four correcting contexts, the extremely low variance reductions and stable standard deviations all suggested that GSE-SI scores had correlated with other related constructs as strongly and consistently as the well-established NGSE-7, providing sufficient support for the validity evidence of the GSE-SI. Therefore, when a single item is used in a correlational study, adjusted correlation using reliability is recommended to achieve similar correlation strength compared with multiple-item scales.

Discriminative power
In previous studies, discriminative power was mainly verified across demographic variables such as gender and age (Elo et al., 2003). Therefore, the analysis methods to identify respondent groups were often simple. In the present study, the evaluation of the discriminative power of the proposed GSE-SI had focused on the respondents' different response patterns. Moreover, the three respondent groups in this study were identified based on cluster analyses rather than simply demographic characteristics used in previous research. Overall, the significant differences across the three groups and similar discrimination patterns between the two GSE scales scores supported the GSE-SI differentiating respondents, suggesting that the latter is comparable to the well-established NGSE-7.

Limitations and Future Directions
This study used various analysis methods and cross-validations throughout the development of the GSE-SI. However, there were several limitations associated with the present study. First, as one of the first attempts in developing the GSE-SI, this study had employed convenient sampling with a limited sample size. To further establish the norms of the GSE-SI, future research should use a larger and more representative sample. Second, the current sample only consisted of participants in Singapore. Despite the useful results found in this study, further cross-cultural validations are required in the future, with the goal of obtaining further reliability and validity evidence for the GSE-SI in other cultures or contexts.

Conclusion
In general, single-item scales are inferior to multiple-item scales, but that does not mean that they are without merits. Although the scores obtained from GSE-SI in the current study are noticeably less reliable and explains less variance in outcomes compared to the NGSE-7 before correction, they have meaningful associations and similar power after correlations were adjusted using reliability. Thus, a researcher who needs a single-item scale of the GSE could use the GSE-SI with the reasonable expectation that, with a sufficient sample size, the scores from GSE-SI will explain relevant outcomes in a pattern similar to the NGSE-7. In sum, the current study provided satisfactory evidence of reliability and validity in terms of using GSE-SI compared to the wellestablished multiple-item scale, NGSE-7. Consequently, the authors opine that using a single-item scale should be encouraged and be recommended as a reliable and valid scale in future general self-efficacy research.