The sample is comprised of 638 participants. Males and females were relatively equally represented, 47.6% females and 52.2% males. The majority were aged less than 40 (66%) while 15% were 40-49 years old and 19% older than 49. Most participants reported being married (72.3%) and having a bachelor’s degree education (63%). The participants self-identified as ethnically white (79.9%), black (8.6%), Hispanic (5.2%) and Asian (4.1%). A full overview of sample characteristics is found in Table 1.
Amazon Mechanical Turk (MTurk) was used to recruit participants living in the United States (U.S.) to complete an online survey, which covered topics of acute and chronic stress and uncertainty, physical complaints, emotion regulation, sleep and health behavior. The survey was implemented using SoSci Survey , a German based survey platform.
The survey was posted as a HIT (Human Intelligence Task) on MTurk in Fall 2020. The task asked that workers complete an externally hosted survey in exchange for $0.50. The HIT was titled “20-25 min. Psychological Survey about Stress and Uncertainty” and described as “This survey aims to investigate stress and uncertainty during the COVID pandemic and validate a psychological scale with English speakers”. The HIT was visible only to workers with an acceptance rate greater than 95% and who were residents in the U.S. To prevent workers from completing the HIT twice, a qualification was given to all workers that restricted them from partaking in the second round. After completing the survey, they were given an automatically generated code, which was required to provide in MTurk for payment (no workers were rejected for payment).
Several instructional manipulation checks were embedded in the survey as a response quality check [13, 14]. It asked participants to respond to a separate question using the same response options. If incorrect, they were warned to carefully read the instructions and given a second chance to correctly answer. Further, quality control measures were included such as, 2. response consistency between birthdate and age, 3. open response questions were manually coded in line with the Chmielewski article (14) and 4. time checks on the quickness to complete the questionnaire were all checked. Participants who failed to correctly respond after the warning were excluded from the analysis, which is the most effective method based on the literature (see 14). From the full sample, 28 observations were dropped due to complete missing data and an additional 205 participants were excluded based on the response quality checks. Therefore, the sample was reduced to N=638. According to one missing in the assessment of sex, the sample size was reduced to N=637 in cases where the variable sex was used for calculations.
The Gießen Subjective Complaints List (GBB-8, ) is a short and reliable instrument for evaluating the degree of somatic symptoms. The eight items identify commonly reported complaints whereby participants respond on 5-point Likert. The GBB-8 was translated into English in accordance with the International Test Commission (ITC) Guidelines for Translating and Adapting Tests . The items were translated from German to English by one bilingual expert and then back-translated to German by a second bilingual expert. Comparison and reconciliation of the original and back-translated items was carried out by a group of experts, followed by a second round of forward and back-translation. The English GBB-8 items are included in Appendix A.
The Patient Health Questionnaire (PHQ-4, ) is a 4-item inventory to very briefly identify depression and anxiety. Items stem from the Generalized Anxiety Disorder (GAD-7) and the PHQ-8. Participants rate items on a 4-point Likert scale. The two-factor structure is represented by the two anxiety items (Factor 1) and the two depression items (Factor 2). The two factors explained 84% of the total variation and factor loadings were all ≥ .82 . Reliability of PHQ-4 scales are good (Cronbach α > 0.80) .
The Perceived Stress Scale (PSS-10)  measures the degree to which life has been experienced as unpredictable, uncontrollable and overloaded over the past month. Participants respond on a 5-point Likert scale. Cohen et al.  originally developed the PSS as a single factor, however since its development, many researchers have concluded the scale represents two distinct factors: (1) perceived helplessness and (2) perceived self-efficacy [19–21]. The PSS-10 consistently shows strong internal reliability (Cronbach α > .70) in diverse populations and meets the criteria for good test-retest validity (> .70) .
The short-English version of the Trier Inventory for Chronic Stress (TICS-9) is based on the original 57 item scale  that was translated into English, shortened and validated [24, 25]. The TICS-9 represents nine factors of chronic stress. These include: Work Overload; Social Overload; Pressure to Perform; Work Discontent; Excessive Demands at Work; Lack of Social Recognition; Social Tensions; Social Isolation; Chronic Worrying. Participants rate the frequency of specific situations over the previous three months on a 5-point Likert scale. The English TICS-9 reflects the strengths of the full 57-item English TICS ; as it is reliable (Cronbach α ≥ .86), shows good model fit, and the scale structure is invariant between males and females supporting the scale validity .
All analyses were conducted in R, using the packages lavaan, moments, multilevel, and semTools [26–29]. There was only a small amount of missing data (166 of 6744 GBB data points; i.e. 2.5%). Nonetheless, we ran confirmatory factor analysis using robust full-information maximum likelihood estimation to deal with missing values and non-normal distributions [30, 31]. For the evaluation of model fit we followed the guidelines provided by Schermelleh-Engel et al. : a non-significant χ², Comparative Fit Index / Tucker-Lewis Index (CFI/TLI) greater than .95 (.97), Root Mean Square Error of Approximation (RMSEA) smaller than .08 (.05), and Standardized Root Mean Square Residual (SRMR) smaller than .10 (.05) for acceptable (good) fit. For CFI, TLI, and RMSEA we used the robust variants [33, 34]. We report McDonald’s ω as a reliability metric .
Next, we tested for measurement invariance by comparing CFI and RMSEA between models that did (did not) constrain the measurement parameters (loadings, intercepts, residuals) to be equal between the groups of interest . Specifically, these include the configural (unconstrained), metric (loadings constrained), scalar (loadings and intercepts constrained), and the strict (loadings, intercepts, and residuals constrained) model. ΔCFI and ΔRMSEA should be smaller than .010 and .015, respectively. Since we incorporated a higher-order construct in our model, we followed the guidelines provided by Chen et al.  and tested first- and second-order invariance successively. After establishing strict invariance on both the first and second factor order, we then also examined the latent mean differences by additionally constraining the higher-order factor to be equivalent between groups. In addition to the χ²-test, we examined the standardized factor mean difference using the formula:
where Ψp is the standard deviation of the respective factor, pooled across all tested groups, nk is the sample size of group k, and αk is the latent variable mean in group k.