Factor Structure and Measurement Invariance Between Fathers and Mothers and Across Three Time Points After Childbirth: Identication of Core Items of The Japanese Version of The Mother-To-Infant Bonding Scale

The Japanese version of the mother-to-infant bonding scale (MIBS-J), a self-report of postnatal bonding disorder, is widely used in Japan. However, its psychometric properties, particularly measurement invariance, have rarely been reported, and the appropriateness of the use of the MIBS-J among perinatal fathers remains to be investigated. This study aimed to select and to validate MIBS-J items suitable for mothers and fathers at three time points (postpartum hospitalization period and 1 and 4 months postpartum) commonly used in clinical practice in Japan.

most of the general population of parents would report "no problem" to these items. Indeed, there may be little clinical signi cance in using all 10 items for all cases, as is currently the case.
These methodological issues require validation and sophistication of the MIBS-J factor structure. The MIBS-J factor structure has been reported in at least two studies [6,14]. Yoshida et al. conducted a longitudinal study of postpartum mothers (n = 554) and collected responses at three different time points: 5 days, 1 month, and 4 months after childbirth [6]. They found that seven, eight, and seven MIBS-J items showed skewness > 2.0 at 5 days, 1 month, and 4 months after childbirth, respectively, although there was no skewness of the MIBS-J items after log-transformation. They further conducted combined exploratory and con rmatory factor analyses (EFAs and CFAs, respectively), and a two-factor structure (lack of affection [LA] and anger and rejection [AR]) was extracted. However, this procedure might have blurred the differences in terms of the factor structure of the scale across the time periods because the data obtained from the three time points were combined to create a single data set.
Subsequently, separate CFAs were conducted for the data from the three time points. Acceptable tness of the model was found with the data at 5 days after childbirth (comparative t index [CFI] = .922). However, tness was far less than required (CFI > .95) [19] at 1 (CFI = .889) and 4 (CFI = .905) months after childbirth. Scrutinizing items one by one separately revealed that standardized factor loading was 0.09 for item 2 ("scared or panicky") at ve days after childbirth and 0.29 for item 7 ("wish baby was different") at one month after childbirth. These ndings suggest that the factor structure of this model might differ across the three time points. The means of the two subscale scores decreased with time; however, Yoshida et al. did not measure the factor mean invariance across the time points [6]. A few years later, Kitamura et al. collected data using the MIBS-J from a cross-sectional sample of fathers (n = 396) and mothers (n = 733) with children aged 0 to 10 years [14]. They identi ed a two-factor structure of the MIBS-J items, which was quite similar to that reported by Yoshida et al. [6]. They found a good t of the model with the data in terms of con gural invariance (CFI = .956 and root mean square error of approximation [RMSEA] = 0.033). However, they did not report on measurement invariance.
The use of a psychological measure requires con rmation of con gural, measurement, and structural invariance of the factor structure of the measure. Thus, the factor structure of the measure should be stable between participants with different demographic features (e.g., fathers and mothers), as well as across various observation time points. If these assumptions do not hold, items of the measure do not have the same meaning and may cause bias in the assessment, and a comparison of the measured scores does not make any sense. The stability of the factor structure is con rmed through several steps [20], and the subsequent steps are endorsed only when the preceding steps are accepted. If one step is rejected, the next step should not be performed. Basic stability is termed con gural invariance. Each group (e.g., fathers vs. mothers) should have the same pattern of items and factors ( rst step). Moreover, factor loadings for similar items (metric invariance, also known as weak factorial invariance; second step), intercepts of similar items (scalar invariance, also known as strong factorial invariance; third step), residuals (errors) of similar items (residual invariance, also known as strict factorial invariance; fourth step), variances of similar factors (factor variance invariance; fth step), and the means of factors (factor mean invariance; sixth step) should be invariant across groups. The second to fourth steps are termed measurement invariance, and the fth and sixth steps are termed structural invariance.
Such con rmation mentioned above is necessary for the MIBS-J to be used in clinical and research settings. This study aimed to select and to validate MIBS-J items suitable for mothers and fathers at the three time points (postpartum hospitalization period and 1 and 4 months postpartum) commonly used in clinical practice in Japan to enable a continuous assessment of bonding disorders among perinatal mothers and fathers over time and to minimize the burden on respondents.

Procedures and participants
We recruited mothers and fathers on days 3-5 postpartum at the maternity ward of one perinatal medical center, three general hospitals, two antenatal clinics, and one birth center in Tokyo and its suburban areas. The inclusion criteria were as follows: (a) a good command of the Japanese language, (b) residing in Japan, (c) having no serious physical diseases or pregnancyrelated complications, and (d) a singleton neonate. One of the investigators (KB) visited the wards and recruited participants after explaining the study and obtaining informed consent from the participants. Data were collected in three waves: 5 days (Wave 1, W1), 1 month (Wave 2, W2), and 4 months (Wave 3, W3) after childbirth. The numbers of fathers and mothers who returned the questionnaire were respectively 421 and 684 at W1, 361 and 590 at W2, and 351 and 566 at W3. We handed (W1) or posted (W2 and W3) the set of questionnaires to the participants and asked them to return them via the postal service. Mothers and fathers were asked to complete the questionnaire independently. Data were collected from December 2015 to June 2016 as part of the rst author's PhD dissertation [21]. The mean (standard deviation) age of the mothers and fathers was 33.1 (4.7) and 34.6 (5.1) years, respectively. More than three-quarters (n = 437) of the mothers underwent vaginal delivery, and 18% (n = 98) of them underwent Cesarean delivery. There were 288 (53%) primiparas and 253 (47%) multiparas.

Mother-to-infant bonding disorder
The MIBS-J comprises 10 items that assess mothers' attitudes and emotions toward their infants [6]. These items are rated on a 4-point Likert scale (0-3), and higher scores indicate that the mother has a more negative attitude and emotion toward the infant. Two subscales were proposed by Yoshida et al. and Kitamura et al. [6,14]: LA and AR. LA items include "feel protective toward my baby" (reverse item) and "feel close to my baby" (reverse item), whereas AR items include "feel angry with my baby" and "feel resentful toward my baby." We used the same questionnaire for the fathers.

Data analysis
The participants who returned the questionnaire at all three time points (350 fathers and 543 mothers) were randomly split into two subgroups. The rst (166 fathers and 282 mothers) and second (184 and 261 mothers) groups were used for EFAs and CFAs, respectively. Missing MIBS-J data were considered to be missing completely at random (Little's missing completely at random test for fathers [p = .582] and mothers [p = .229]). The missing data were handled by pairwise deletion for all analyses, except for CFAs. For CFAs, the missing data were handled using the full information maximum likelihood method.
In an EFA using the rst half sample, we calculated the skewness and kurtosis of all the MIBS-J items. When excessive skewness or kurtosis was present, the MIBS-J items were log-transformed. Items that showed excessive skewness (> 4.0) or kurtosis (> 15.0), even after log-transformation, were excluded from further analyses. We performed EFAs for the remaining items after conforming to the Kaiser-Meyer-Olkin index of sampling adequacy [22] and Bartlett's test of sphericity [23] to examine the adequacy of the sample size and non-zero correlations between items [24]. The number of factors was determined using a scree plot. The minimum acceptable factor loading was 0.30 [25], maximum likelihood extraction was performed, and the axes were rotated using Promax rotation.
Next, using the second half of the sample, CFAs were performed to obtain the adequate MIBS-J model extracted by the EFAs.
Measures of goodness-of-t were chi-square (CMIN), CFI, and RMSEA. A good t was de ned as CMIN/df < 2, CFI > .97, and RMSEA < 0.05 [26], and an acceptable t was de ned as CMIN/df < 3, CFI > .95, and RMSEA < 0.08 [26,27]. However, when the sample size is relatively small (< 500) or a model is complex, these criteria might be stringent, and the use of more exible criteria is suggested (e.g., CFI > .90 and RMSEA < 0.10 [28]). We also considered these exible criteria when examining model t.
Measurement invariance of the best model was tested with the full sample (participants who returned the questionnaire at all three time points; 350 fathers and 543 mothers), between fathers and mothers, and across the three observation periods. A series of hierarchical models was tested as follows. First, con gural invariance was tested. Once con gural invariance was supported, which indicates that both sexes or periods share the same factor structure, metric invariance was tested. When metric invariance was held, which indicates that the factor loadings were equivalent across sexes or periods, scalar invariance was assessed by restricting the item intercepts to be equal across sexes or periods. When scalar invariance was supported, a comparison of residual invariance between sexes and periods was implemented to examine residual invariance. In addition, structural invariance was needed as evidence of factor structure robustness, and it included factor variance invariance. If one of the above steps was rejected, subsequent steps were not performed. Invariance from one step to the next was "accepted" if we noticed either (a) a non-signi cant increase in χ 2 for df of difference, (b) a less than 0.01 decrease in the CFI, or (c) a less than 0.01 increase in the RMSEA [29,30]. The CFI and RMSEA may be better indicators of measurement invariance than χ 2 because χ 2 is sensitive to the sample size and may thus produce excessive "rejection" rates. In addition, since scalar-level measurement invariance is rarely con rmed [31] and there is no consensus on the level to which it should be con rmed, we considered measurement invariance to be present if it is con rmed up to at least metric invariance. To determine measurement invariance, a distinction was made between full and partial invariance [20,32,33].
All statistical analyses were conducted using IBM SPSS 26 and Amos 26 (IBM Corp., Armonk, NY, USA).

Ethical consideration
This study was approved by the Ethical Committee of St. Luke's International University (15-074).

Results
Factor structure derived from EFA In the rst half group, many MIBS-J items showed high skewness and kurtosis (Table 1). Skewness and kurtosis were less severe after log transformation; however, there were still ve items (items 3, 4, 5, 7, and 9) with skewness > 4.0 and kurtosis > 15.0. Hence, we excluded those ve items, and the remaining ve MIBS-J items were entered into an EFA. The Kaiser-Meyer-Olkin index was 0.74-0.75, and the Bartlet test was χ 2 (df) = 171.99 (10)-337.98 (10) (p < .001). Therefore, the datasets were suitable for EFAs. The EFAs for the fathers and mothers were performed separately. The scree test suggested either one-or twofactor solution for both fathers and mothers. In the one-factor model, all the MIBS-J items, except item 2, showed factor loadings > 0.3 at all three time points (Supplementary Table 1, Additional File 1). In the two-factor solution, the rst factor was loaded highly (> 0.3) on MIBS-J items 1, 6, 8, and 10, which re ect LA, as demonstrated in previous studies [6,14]. The second factor was loaded highly on only item 2, which re ects AR. We excluded this item from further analyses because a factor with only a single indicator having a high factor loading is unstable for a measurement model.
The remaining four MIBS-J items (items 1, 6, 8, and 10) belonged to the LA category. A single factor EFA showed high factor loadings for all items at all time points among both fathers and mothers (Table 2).

Measurement invariance across sexes or periods of time
When comparing the factor models between fathers and mothers, con gural invariance was accepted. However, metric invariance was rejected for W3 (Supplementary Table 2, Additional File 2). Therefore, we examined the z value, which indicates the group differences in the factor loadings, and found that item 10 showed the largest group difference (z = 3.165, p < .01). Therefore, after excluding item 10, we re-examined the EFA and con gural invariance ( Table 3).
The remaining model had a three-item structure (items 1, 6, and 8). When comparing this model between fathers and mothers, con gural invariance was con rmed at all three time points. Measurement invariance also conformed to the stability of factor variance at W1 and W2 and up to scalar invariance at W3 (Table 4).
When comparing the three time points, con gural invariance was con rmed in both fathers and mothers (Table 5). Among fathers, metric invariance was rejected, although partial invariance was supported by freeing the restriction of item 6.
Subsequently, scalar invariance was accepted. Among mothers, metric invariance was proven. Scalar invariance was rejected in both fathers and mothers (Table 5).

Discussion
In this study, con gural and measurement invariance between mothers and fathers and across three postnatal time points was observed using only three MIBS-J items. This nding does not echo that in other studies of the two-factor MIBS-J structure among Japanese mothers after childbirth [6] or parents with children aged ≤ 12 years [14]. All three remaining items belonged to LA. All the items belonging to AR were deleted because of excessive skewness/kurtosis or instability of the measurement model. Unlike LA items, AR items are likely to be in uenced by some situations, e.g., newborn colic [34,35], which might have reduced the invariance of the instrument. Our ndings highlight the importance of emphasizing affection items to measure parental bonding with stability during the rst four postpartum months.
The use of the full 10-item version may be a burden on the participants and may result in score biases due to repeated measures. Notably, the three-item version may be much easier for perinatal health professionals, such as midwives and community nurses. Therefore, in clinical and research settings, it would be advisable to use the three MIBS-J items, rather than all 10 items, for continuous observation, assessment, and comparison of parental bonding, at least during the rst 4 months after the birth of the infant.
The limitations of this study should be noted. First, this study included fewer fathers than mothers. Thus, replication studies are needed before a conclusion is reached. Second, several MIBS-J items showed skewness. This nding suggests that the selection of the participants may have been biased, and the study cohort included few poorly bonded participants.

Conclusions
Our study con rmed that the three-item MIBS-J was psychometrically robust among Japanese fathers and mothers during the 4-month period after childbirth. All these three items belong to LA, as reported in previous studies [6,14]. Thus, we believe that it is important to focus on affection to measure parental bonding with stability during the rst four postpartum months using the

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was supported by a MEXT KAKENHI grant (Number, 25293458; PI, Yaeko Kataoka) and a grant-in-aid from the Yamaji Fumiko Nursing Research Fund.
Authors' contributions KB designed and coordinated this study, carried out the analysis and interpretation of the data, and drafted and revised the manuscript. TK contributed to data analysis and interpretation of the study ndings and reviewed and revised the manuscript.
YK participated in the design of this study and contributed to the interpretation of the study ndings. All authors read and approved the nal manuscript.  boldface; the upper gure in each cell represents factor loading (or total variance explained) among fathers, whereas the lower gure in each cell represents factor loading (or total variance explained) among mothers. Item scores after log transformation were entered into an exploratory factor analysis. MIBS-J = Japanese modi cation of the mother-to-infant bonding scale; W1 = 5 days after childbirth; W2 = 1 month after childbirth; W3 = 4 months after childbirth; n = 166 for fathers and 282 for mothers. Factor loadings > 0.30 are presented in boldface; the upper gure in each cell represents factor loading (or total variance explained) among fathers, whereas the lower gure in each cell represents factor loading (or total variance explained) among mothers. Item scores after log transformation were entered into an exploratory factor analysis.