This paper describes the measurement reliability of anthropometry fieldworkers in the South African NDIS-2022, as assessed after standardised training. The NDIS-2022 was the first South African national multi-site nutrition survey to implement this standardised training and reliability assessment approach. Results suggest a high level of reliability among the site lead anthropometrists and lower (though still acceptable) reliability among fieldworkers. Almost all ICC and R values exceeded 0.9, indicating excellent reliability (25). However, %TEM indicated lower reliability for MUAC, CC AND WC than for weight and length/ height.
Few guidelines for acceptable TEM values exist. Absolute TEM is proportionate to the size of the measured value; thus, acceptability thresholds might vary by age group (e.g. infants vs. adults) and measurement (e.g. MUAC vs. length/ height). Using %TEM mitigates this by expressing TEM relative to the size of the measurement, which may allow for consistent cutoffs across various measurements. Carlsey et al (16) used %TEM < 2.0 for weight and length measurements in children 0–18 years, which our study achieved in all cases except fieldworkers' weight measurements in children 0-<2 years. Though not included in the Carlsey et al study, %TEM for MUAC was > 2 in all groups in our study, suggesting higher measurement variability. In adults, Perini et al (23) proposed stricter %TEM cutoffs for weight and height, with different standards for beginner (inter-rater %TEM < 2, intra-rater %TEM < 1.5) and experienced (inter-rater %TEM < 1.5, intra-rater %TEM < 1.0) anthropometrists. By this standard, both site leads and fieldworkers had excellent reliability for adult weight and height measurements. Perini et al did not include MUAC, CC or WC in their guidelines, but the higher %TEM for these measurements suggest greater variability than for weight and height.
Previous studies from primarily high-income countries have described inter- and intra-rater reliability of anthropometric measurements in children (13, 15–17), adolescents (14, 16), adults (15, 29) and the elderly (3). Whilst exact results vary, some patterns are evident. Firstly, intra-rater reliability consistently exceeded inter-rater reliability, a pattern that held true in this study. Secondly, weight measurements consistently had the highest reliability, and WC (when measured) the lowest (3, 13, 15, 17). In this study, the reliability of weight and length/height measurements exceeded that of MUAC, CC and WC. This is consistent with the technical difficulty of circumference measurements as well as the novelty of the techniques – particularly CC, which was novel to all the site leads and fieldworkers.
Site leads displayed better reliability than fieldworkers for most measurements, although TEM was only significantly different for weight (intra- and inter-rater reliability), length/height (intra-rater reliability) and CC (intra-rater reliability). This may be because most of the site leads (unlike fieldworkers) were dietitians/ nutritionists with prior tertiary-level training in anthropometric assessment. Site leads' psychological investment and sense of ownership as project leaders may also have inspired more meticulous measurement practices. Unexpectedly, the fieldworkers outperformed the site leads for WC measurements, although only inter-rater TEM differed significantly. This underscores the importance of training and standardisation even if anthropometry is done by qualified persons as, for example, institutions differ in terms of WC site identification protocols, resulting in different values (4).
Bland-Altman analyses revealed no statistically significant bias, except in fieldworkers' intra-rater reliability of height measurement in subjects > 12 years old, though the magnitude of the bias was small (0.22 (0.042-0.400) cm). Visual inspection of Bland-Altman plots shows greater variability of length/ height measurements in subjects < 100cm tall (i.e. infants and young children), which is consistent with the technical difficulty of measuring length/ height in this age group. No other age-related trends were evident. In general, acceptable group-level measurement agreement was achieved within and between raters for all the pertinent anthropometric parameters, as is relevant for a national study.
The data described here provides statistical evidence of the ability of NDIS-2022 site leads and fieldworkers to perform anthropometric measurements reliably. The use of consistent training materials, standardised measurement protocols, and identical brand-new equipment across all study sites further contributes to anthropometric data quality. Additionally, this study reports the first South African reliability data for adult CC measurement. This study, alongside the publicly available training materials, paves the way for consistent quality standards for anthropometry in future large-scale South African studies, thus contributing to harmonisation and comparability of data over time and across different settings.
Some limitations must be acknowledged. First, we did not assess measurement accuracy (i.e. how closely the measured values approximate the true value). Obtaining true "gold standard" anthropometric measurements requires highly trained and accredited anthropometrists, which were unavailable in this setting. Careful equipment selection with daily verification checks should minimise equipment-related errors, but systematic errors due to suboptimal measurement technique cannot be ruled out. Finally, limited numbers of trainees and volunteers in some provinces and age groups increase statistical volatility. The pooling of data from all sites allowed for meaningful statistical analyses but may have obscured some inter-site differences.
In line with international recommendations (1, 10, 19), standardised anthropometric training and reliability assessment should precede any large nutrition survey. This increases confidence in the study results (or, conversely, highlights limitations to consider when interpreting the data). Our experience suggests that two days of training are likely insufficient for fieldworkers with no previous anthropometry experience, and more time should be allotted, particularly for hands-on practice. Adding a pre-training assessment of measurement skills would provide evidence of training effectiveness and allow for refinement of the training approach based on trainees' strengths and challenge areas. During data collection, ongoing reliability assessment must be incorporated to ensure that data quality is maintained. Finally, statistical evidence of anthropometric data quality should be reported in detail alongside the main study results. For child anthropometry, the guidelines set out by the WHO (10) should be followed, and calculation of the composite index of anthropometric data quality described by Perumal et al (1) is recommended. Communication of data quality (or the absence thereof), as well as steps taken to identify and manage errors, are prerequisites for ethical and transparent science communication and essential for continued trust in science (30). This empowers both researchers (by allowing re-analyses of existing data, knowledge synthesis, and study reproduction) and policymakers (by allowing meaningful longitudinal monitoring of population trends) (30).