Procedures and Samples
According to the school-running level, we first divided all higher education institutions of Changsha (which is one of the largest educational centers in Mainland China) into three categories (i.e., Key Universities, General Universities and Vocational Colleges), then we adopted the strategy of stratified random sampling, taking three schools from each level. Data were collected through an online platform (Qualtrics, Provo, UT) with the help of the teacher in charge of a class (class advisers). Researchers introduced the objectives of the study to the class advisers and trained them how to give instructions to the students. As the link was forwarded, a total of 4333 responders started the survey (i.e., clicked the link), but only those with a 100% completion rate and rated “excellent” by Qualtrics were retained for further analysis. In addition, 26 participants were excluded because their reported ages and educational level did not match those of the target sample (i.e., full-time undergraduate or below aged 14 to 25 years). The final sample size was 2086, including 409 (19.6%) males and 1677 (80.4%) females. The mean (±SD) age for the total sample, males, and females was 18.30 (±1.30), 18.49 (±1.20), and 18.25 (±1.32), respectively.
Ethics
Given that some of the participants aged under 18, we sent the informed consent to their parents (or legal guardians) with the help of the class advisers prior to the survey. The administration of the self-reported questionnaire was based on the premise that both the guardian and the subject assent to participant. We failed to obtain the written informed consent because the participants are almost resident students who meet their parents (or legal guardians) mostly during the holidays. Anonymity and confidentiality were guaranteed during the whole progress of survey and data analysis, and no data regarding the identification of participants were collected, including their Internet Protocol address. The study protocol emphasized the foregoing irresistible clause was approved by the Ethics Committee of the Second Xiangya Hospital of Central South University. Some data, not related to the current study, were also collected and will be presented elsewhere.
Instruments
The Chinese version of the PMPUQ-SV (C-PMPUQ-SV)
The C-PMPUQ-SV was adapted from Lopez-Fernandez et al. [42] and comprises 15 items that assess three postulated factors: PD, DU, and PU. The items are scored on a 4-point Likert scale ranging from 1 (strongly agree) to 4 (strongly disagree) and some items have to be reversed before scoring. Higher scores reflect more serious PMPU. From standard scale-adapting procedures [51], one author (JL) first translated the French and English items into Chinese. Two authors (YHL and YYW), both with clinical research backgrounds and good proficiency in English, then back-translated the scale. One author, who is also the creator of the original version of the scale (JB), supervised the process and confirmed that the back-translated items corresponded to the original items. In accordance with recent work conducted to update the PMPUQ [36], some wording of items pertaining to the DU subscale (i.e., items 2, 5, 11, 14) was modified prior to the translation procedure to cover DU by both pedestrians and cyclists (e.g., looking at a smartphone while crossing a road). Indeed, research has shown that DU of mobile phones is no longer limited to drivers and that pedestrians are more and more putting themselves into risky situations while using smartphones [52, 53]. The three versions (i.e., French, English, and Chinese) of the 15-item PMPUQ-SV and descriptions of the items are provided in an additional file (see Additional file 1).
The Chinese version of the Smartphone Addiction Proneness Scale (C-SAPS)
The Smartphone Addiction Proneness ScaleSAPS) [30] is a self-reported instrument to assess symptoms of smartphone addiction (i.e., functional impairment, withdrawal, tolerance, online life orientation). It includes 15 items rated on a 4-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree), with higher scores implying higher smartphone addictive use. The Chinese version of the scale (C-SAPS) [54] was adapted prior to the current study by our team. The internal consistency of the total scale was 0.852 in the current sample, which corresponds to excellent reliability.
Depression Anxiety Stress Scales–21 (DASS–21)
The DASS–21 [55, 56] is one of the most used and reliable scales for measuring emotional symptoms worldwide. It has been translated into 50 languages [57], and the factor structure has been established in Chinese [58]. It measures three types of emotional symptoms: depression, anxiety, and stress. Each subscale contains seven items scored on a 4-point Likert scale, ranging from 0 (did not apply to me at all) to 3 (applied to me very much, or most of the time); higher scores indicate more emotional symptoms experienced in the past week. In the current sample, the internal consistency coefficient for the depression, anxiety, and stress subscale was 0.870, 0.762, and 0.834, respectively, indicating good internal reliability.
Data analytic strategy
Four consecutive statistical analysis steps were performed with SPSS 23.0 (IBM, 2014) and Mplus 7.4 (Muthén & Muthén, 2015). First, an EFA was conducted on a randomly split half of the total sample (sample 1, n = 1043). We used principal component analysis with Promax rotation to identify an optimal data-driven factor structure, as this method allows correlations between latent factors. Items that were found to be inconsistent with the original PMPUQ-SV, along with those that loaded equally on more than one factor, were removed.
Second, a series of CFAs were conducted on the other split half of the sample (sample 2, n = 1043) to compare the fit of several competing models. Model 1 tested the original three-factor model containing 15 items [42]. Model 2 tested a two-factor model in which the PU subscale was not used because of the low internal consistency found in previous studies [18, 42]. Model 3 tested the 11-item model that was identified in our EFA. As the measures of the C-PMPUQ-SV items are not normally distributed (i.e., the skewness for each item ranged from 0.015 to 0.652 and the kurtosis ranged from 0.009 to 1.912), we used the Satorra-Bentler maximum likelihood mean adjusted estimator instead of the maximum likelihood estimator. To assess the fit of each model, we examined multiple fit indices [59], including chi-square, the root mean square error of approximation (RMSEA), the standardized root mean square residual (SRMR), the Tucker-Lewis index (TLI), and the comparative fit index (CFI). We simultaneously adopted CFI and TLI values of ≥ 0.90 and an RMSEA value of ≤ 0.05 as having a good fit [60]. Notably, as the chi-square is known to be highly influenced by the sample size [60], it was reported but not considered as a fit index in the present study.
Third, on the one hand, we examined reliability by analyzing the internal consistency of the adapted scale and its subscales. On the other hand, we calculated the Pearson’s correlation coefficients between the C-PMPUQ-SV, the C-SAPS, and the DASS–21 to test convergent and construct validity. Both analyses were performed for the whole sample.
Last, we assessed measurement invariance across gender after the best structure was determined. We initially assessed the best-fit model in male and female groups separately, and then we tested configural invariance, metric invariance (or weak invariance), scalar invariance (or strong invariance), and error variance invariance (or strict invariance). More specifically (1), configural invariance tested whether the basic model structure of the latent variables was invariant across groups; (2) metric invariance, built on configural invariance, constrained factor loadings to be equivalent across groups; (3) scalar invariance, while assuming configural invariance and metric invariance to be established, tested whether the variable intercepts were equivalent across groups; and (4) error variance invariance, based on all of the previous types of invariance, set the error variance to be equal. We capitalized on fit index differences for RMSEA, SRMR, CFI, and TLI (i.e., ΔRMSEA, ΔSRMR, ΔCFI, and ΔTLI) as reference points, with a P-value of < 0.01 indicating no difference, a P-value between 0.01 and 0.02 indicating moderate difference, and a P-value of > 0.02 indicating an important difference [61, 62]. After having established measurement invariance, we computed a series of independent sample t-tests in order to examine gender differences regarding the various C-PMPUQ-SV subscales.