Agreement between maternal recall of distant rst birth events with hospital birth records: A cohort study

BACKGROUND Inter and intra-generational birth cohorts could be particularly useful for predicting the likelihood of labour and birth events for nulliparous women. However, maternal recall of their rst childbirth may be imprecise, and hospital records can be inaccurate. Establishing the extent of agreement between mothers’ recall and hospital reports of historical rst birth events could be the basis of a prediction tool that could contribute to better health care practices during daughter’s perinatal period. METHODS In 2015, women who had their rst baby between 1967 and 1997 were asked to recall gravidity, method of labour onset, type of pain relief, length of labour, birth outcome, and infant’s gender, birthweight and gestational age ≥ 17 years postpartum. Responses were compared to hospital birth records. Agreement was evaluated using Bland-Altman’s plots and Kappa statistics (k). Logistic regression modeling was used to determine factors inuencing discrepant recall. RESULTS Of 150 questionnaires distributed, 101 records were complete. Up to 49 years after birth there was strong agreement for birthweight measured at interval (mean discrepancy -28.69g, SD =170.91g, Bland-Altman 95% limits of agreement (-363.66g, 306.28g)) and category level birthweight k=0.83, good agreement for gestational age (GA) in weeks, at interval level (mean difference=0, SD =0.90, Bland-Altman 95% limits of agreement (-1.766, 1.766)) and at category level GA k=0.56. There was moderate agreement for labour length ( ≤ 10hrs/>10hrs) k=0.54; 43% of records did not record this information. For gravidity k=0.43, labour onset k=0.79; any pain relief k=0.61; and birth outcome k=0.91. Univariate logistic regression showed better agreement on infant birthweight in women with higher levels of education, lower agreement for onset of labour method with increasing maternal age at birth, and higher agreement for use of pethidine, but lower agreement for use of epidural in women who had their rst babies more recently.

data may contribute to more individualised care for nulliparous women, and may limit rising interventions based on population level guidelines. Future research in other settings is warranted before diagnostic criteria may be used in clinical settings.

Background
Research on familial factors in uencing reproductive health depends on good quality data about birth events. Women who are currently pregnant may act upon familial birth information for more individualized care. Reproductive histories show babies of the same mother tend to have similar weight for gestation [1] , small for gestational age (GA) babies show intergenerational recurrence [2] , and withinfamily labour and birth characteristics may predict labour length for rst time mothers [3] . Intrapartum events are often obtained by maternal recall. Agreement analysis between recall and recorded birth data is essential so that birth histories can be based on robust evidence.
Despite the number of studies suggesting that agreement between hospital records and maternal recall of birthweight and/or GA, and/or mode of birth is su ciently reliable (summary range for birthweight k=0.71-1 [17,18] ; for GA k=0.6-0.85 [11,16] ; for mode of birth k=0.80-1.00 [15,24,29] ), there is evidence that maternal characteristics, such as education [5,8,[9][10][11][12][13]15,17,18,20,23,29,30] , socioeconomic status [5,9,10,12,13,16,18,20,22,25,27,28,30] , ethnicity [5,8,9,10,16,17,20,22,25,28,30] , and gravidity/parity [19,22,29] may impact on strength of agreement, with disparities reported across studies. There has been one systematic review and meta-analysis of the validity of recalled and recorded birthweight [33] ; high agreement was found (range 86-129g, n=29,293). Maternal recall of gravidity and parity were only investigated in the ≥10 years from delivery time-frame (range k=0.74-0.98) [22,29] . Use of analgesia and labour length have been investigated less frequently across all time-frames with kappa coe cients for analgesia agreement ranging from 0.58-0.85 [4,13,15,29,32] ; and for labour length agreement from k=0.21 to 71% agreement [8,11,18,26,29,32] . As far as we are aware, there have been no studies conducted comparing maternal recall agreement with birth records for method of labour onset. Some studies assume that hospital records are accurate [18] , however, there is evidence that data are not always correctly recorded in hospital records [11] . This study does not assume that the gold standard is the hospital account. Most studies have focused on one or a few variables. Only Buka et al. (2004) [29] analysed six recalled and recorded birth outcomes in 96 women, 22 years later and found that age at birth and parity were accurately reported, more educated women generally recalled events more accurately than less educated women, and women with major medical events such as cesarean section (k=1), were more likely to recall these events in line with what was recorded in hospital records. This study examines agreement between maternal recall and hospital birth records for 8 intrapartum variables, including labour onset which has not been addressed previously. Mothers were asked to complete a questionnaire on average 33 years after delivery reporting on: gravidity, length of labour, pain relief, birthweight, GA, mode of delivery, infant gender and labour onset. Reasons for discordance in agreement were identi ed.

Methods
Women who were accompanying family members or friends attending antenatal clinics in either of two Israeli maternity hospitals between 2015-2016 were recruited by the attending midwife. Eligible women were those who had given birth to their rst child in that hospital more than 17 years ago (to exclude very young women who may have health consequences such as pre-term birth and lower birthweight Questionnaires included name, telephone number, national identity number (for tracking records), demographic information, general and obstetric health histories, and prenatal and perinatal information. Maternal history items included mother's rst birth age, delivery date, gravidity, weight gain during pregnancy, duration of pregnancy, signs of labour, length of labour, use of pain relief medication, birth outcome, GA at birth, infant gender, birthweight, and Apgar of newborn. Responses were either categorical or continuous. Length of labour was measured on a categorical scale with four time intervals (≤2 hours, >2-6 hours, >6-10 hours, >10 hours) based on the ndings of a systematic review which identi ed the mean active labour duration for 7,009 nulliparous women as 6 hrs ± SD 3.5 hrs [34] .
Data were anonymized, numbered and assessed manually for errors. Missing data or ambiguous values were queried by telephone conversation. Recalled rst birth events were compared for agreement with data in hospital records. A pilot study (n=10) con rmed that recruitment procedures, questionnaire use and return of data were satisfactory. Pilot study results (n=9) were included in the main results.

Statistical analysis
Sample characteristics were described using descriptive statistics, means and standard deviations (SD), or medians and interquartile range (IQR) for continuous variables, and frequencies and percentages for categorical variables. Multiple births were included in the analysis with each child treated as a single unit. Bland-Altman [35] plots were used to determine agreement between maternal recall and medical records for infant birthweight and GA as continuous variables. To measure agreement for birthweight and GA as categorical variables optimal cut-offs for clinical signi cance were applied: low (≤2,499g), normal (2,500-3,999g) and high (≥4,000g); and preterm (≤37 weeks), term (37)(38)(39)(40) weeks) and post-term (≥41 weeks). Mothers length of labour was dichotomized (≤10hrs/>10hrs). Participants were asked to report on labour length within a time category because poor speci city over time is common for women reporting time-bound peripartum events [36] . Kappa statistics together with 95% con dence intervals (CI) was calculated for categorical and dichotomised variables. Strength of agreement was classi ed according to Landis and Koch (1977) [37] with kappa values of ≥0.8 indicating 'excellent', 0.61-0.8 'substantial' and 0.41-0.6 'moderate' agreement.
Logistic regression analysis was used to predict discrepant recall of birth events. For continuous and categorical variables, the corresponding binary variables were created indicating inaccurate or accurate recall (coded as 0 and 1 respectively). The impact of independent variables on maternal recall is presented as odds ratios (OR) with 95% CI and p values. Variables that had no effect were deleted from the regression analysis in a stepwise manner. Recalled and categorical independent variables were also analysed using chi-squared test. Statistical signi cance was de ned as p<0.05. Data were analysed with SPSS version 24.0 (Armonk, NY:IBM Corp.).

Results
The number of questionnaires distributed was set by the available women in the time period of the study. Of the 150 questionnaires distributed (10 in pilot study, 140 in main study), 121 completed questionnaires were returned (81%). Of these, 20 were excluded from the analysis, due to missing hospital birth records (n=14), and having a birth less than 17 years prior to the study (n=6), making a nal study sample size of 101 ( Figure 1).
Demographic characteristics of participants are presented in Table 1. Participants gave birth to their rst child between 1967 and 1998, were currently aged 37-70 years, were aged 18-41 years at rst delivery, and the years elapsed since rst birth was 17-49 years. Most participants were educated to high school level (30%). Most women were Israeli (75%), secular (44%) and married (83%).
Recalled and recorded perinatal and new-born outcome information is presented in Table 2. (see end of manuscript for Table 2) Multiple births included one set of triplets and three sets of twins giving a total of n=106 infants.
We reiterate, there is no single gold standard for agreement status to use as the reference. Bland-Altman plots for agreement between recalled and recorded continuous infant birthweight and GA showed small differences for birthweight records with a mean difference of -28.69g, SD=170.91, and 95% limits of agreement (-363.7, 306.3). There was an observed tendency towards lower maternal estimation than recorded data among normal birthweight infants of between 2700-3500g ( Figure 2).
Gestational age agreement was within an acceptable range. Whilst the estimated mean difference was equal to 0, the 95% limits of agreement were relatively wide, ±1.77 weeks. There was an observed trend towards mothers reporting a lower GA where the records reported 39 and 41 weeks of gestation, and a trend towards maternal reports of higher GA for those recorded as over 41 weeks' gestation ( Figure 3).
When comparing data from medical records and mothers recall using the categories low (≤2,499g), normal (2,500-3,999g) and high (≥4,000g) for birthweight, and GA groups of preterm (≤36 weeks), term (37-40 weeks) and post-term (≥41 weeks) it was found that mothers of low and high birthweight babies tended to recall them as being smaller or larger respectively than recorded in the hospital records. Good agreement between records and maternal recall for babies of normal birthweight was found (Table 3). Similarly, mothers of preterm and post term babies reported lower and higher birthweights respectively when compared to the hospital records, whereas length of gestation for term babies had much closer agreement between maternal recall and the records (Table 4). Agreement between recall and medical records for ve categorical variables and ve dichotomous variables are shown in Table 5. For categorical variables infant birthweight and mode of delivery almost perfect agreement was found with an exact match in 42% of women. For infant birthweight 30%, 15% and 7% of the women had up to 100g difference, between 101 to 300g difference and >300g difference respectively. Categorised GA had the lowest level of agreement. An exact match between recall and records was found for GA in 35% of women, with 29% and 6% of women reporting a one-week difference and two-week difference from their records respectively. Some 30% of women did not report on this parameter. For the majority of these, data were available in their records. For method of onset of labour and type of pain relief, an exact match was found in 61% and 57% of women respectively.
Agreement results for dichotomous variables show moderate agreement for length of labour ≤10hrs/>10hrs, however, 43% of data for labour onset was missing from birth records. Agreement between women's accounts and records for the number of previous pregnancies was 95%. Fifteen women recalled having had a previous pregnancy while birth records only reported on 9 women with a previous pregnancy. The nature of the pregnancy loss is not speci ed in the records. Having labour induced and having an epidural (yes/no) were both found to be substantially in agreement between recall and records. Of the 46 women who recalled receiving an epidural, 42 had this recorded in their hospital records. Only fair agreement between recall and records was found for use of pethidine.
Logistic regression analysis showed statistically signi cant predictors for inaccurately recalled variables ( Table 6). The pain relief variables (epidural and pethidine) were in uenced by time elapsed since delivery. There was less recall-record agreement for use of pethidine for women who had more years elapsed since birth. However, in the case of epidural analgesia, women who had less years elapsed since birth showed less agreement between recalled and recorded epidural use than women who had more years elapsed since birth. Women who were older at the time of their rst birth showed less agreement between recalled and recorded type of onset of labour than those who were younger women.
Finally, high school vs higher education was a statistically signi cant factor associated with higher differences between recall and records (p=0.029). For women with education up to high school level, the odds of disagreement between recalled and recorded infant birthweight by more than 100g were higher (OR=3.1, 95% CI (1.1,8.4)) compared to women with higher education. Analysis of ordinal education variables with four categories, as presented in Table 4, shows statistically signi cant linear-by-linear association test for trends between education and inaccurate recall (p=0.024).

Discussion
Our data suggest that long-term recall of most perinatal events results in close agreement with hospital records up to 41 years later. Agreement for mode of birth and infant birthweight was generally higher than the other variables studied, consistent with other studies researching agreement ≥10 years from birth for mode of birth [24,29,32] and infant birthweight [23][24][25][26][27][28][29][30]32] . Furthermore, the agreement level for these two variables was stable even when adjusted for time elapsed since birth, though recall of birthweight had better agreement for women with higher education levels, and recall for mode of birth had better agreement for women who were younger at time of delivery. High agreement between maternal recall of infant birthweight and birth records may re ect repetition of information to others after delivery [24] . The association of lower levels of agreement for women with less years of education is consistent with ndings from ve other studies [5,15,18,29,30] . However, the actual difference was under 300g for 93% of women, which is likely to be of little clinical relevance. Consistent with one other study [20] , Bland-Altman plot showed good agreement for GA as continuous data while agreement was found to be moderate as expressed as a kappa statistic after categorizing these data. This difference may be explained by the categorisation of GA for kappa analysis, although the discrepancies between recalled and documented information were less than one week in all groups, which may be of little clinical signi cance, unless decisions are being made about induction for post-maturity. In terms of the number of previous pregnancies, 95% of women's accounts and records agreed. Due to the delicacy of the data, information about abortions or a termination of pregnancy may have been withheld.
Only three other studies report recall of length of labour more than ten years since rst birth [26,29,32] . Our ndings of moderate agreement for labour length are in accord with these ndings. The fact that nearly half of all records did not report on labour onset (and thus labour length) was unexpected, the consequences of which are seen in the large discrepancy between recall and actual events for this variable. This is a critical omission, as length of labour is an important trigger for routine labour interventions [38] . Moreover, it limited our ability to assess the level of agreement with precision. There is currently widespread controversy over the nature and limits of physiological labour length. This debate, focusing speci cally on the effects of vaginal birth or interventional delivery on rising cesarean rates is based on epidemiological analysis of labour progress patterns in hospital records [39] . If these data are inaccurate, or missing, then future policies and guidelines may also be inaccurate.
To the best of our knowledge, this is the rst study that has investigated maternal recall of all modes of labour onset (including natural, pharmacological medication and mechanical or physical approaches).
Limited obstetric information on labour onset may weaken rationale for induction, augmentation and cesarean birth for slowly progressing labours. Given the current global burden of rising birth interventions, this is a critical gap in the literature.
Two studies looked at induction of labour (rather than mode of onset). Elkadry et al, (2003) [8] interviewed 277 women 10 weeks after birth and found that one of 8 mothers did not know whether her labour had been induced, and of 108 women whose labour had been induced, 1/10 women could not recall the indication that was stated in the records. Bat-Erdene et al., (2013) [13] interviewed 755 women four months' post-partum and found that induced labour had lower sensitivity and speci city than the other variables studied. In the current study agreement about augmentation with synthetic oxytocin (Pitocin) was low. Only eight women believed they had been given Pitocin in labour, but records noted that this had happened for 17 women. This study is unable to elucidate the reasons for the discrepancy. There is a possibility that the nine women with no recall of the event were never informed that they had been induced. This nding is supported by similar ndings in other studies [26,29] . It is also possible that older labour records are less accurate for these details.
The amnesic effect of pethidine may explain the underreporting of pethidine usage by mothers across the whole cohort, when compared to hospital records. It is intriguing that women who had their babies more recently were less likely to agree with their birth records about whether they had an epidural or not. This remains an open research question. It highlights the need for researchers to use caution when calculating rates of epidural usage further complicated by staff routinely under recording epidurals that were administered (in 3% of cases higher levels were recalled by participants than recorded in the notes). Social desirability response bias can be associated with self-report questionnaires, especially in sensitive areas such as childbirth. High pressure work situations may induce staff to make recording mistakes. However, in general, our study showed acceptable agreement between the two types of data. We believe they are a good barometer of long-term maternal recall. Although our sample was small, item non response was rare in the questionnaire unlike other previous studies of long-term maternal recall [18,24] , and neither source (recall or records) were used as the gold standard, in recognition of the potential aws in both.

Strengths and limitations
The setting for this study included populations in which generations of women tended to give birth in the same hospital, meaning that hospital records existed for even those who had their births two generations previously. Potential errors in chart abstraction were minimised by the use of personalised national identity numbers. Limitations include incomplete or missing personal and demographic data within some hospital records, especially for length of labour. Hospital records also tended to round off GA to whole weeks. Some variables were uncommon in the sample.

Conclusions
This study shows that clinicians may use perinatal information reported by mothers in the absence of medical records, even when this information is required many decades after birth. Further work is needed on how records report labour onset and length, and how women report it, to ensure that what is described in clinical records, and what women report in birth histories, is captured and harmonised. Further studies on agreement data of a broad range of birth events from a range of settings is warranted to determine the replicability of our ndings.