Study population
The Institute of Nutrition of Central America and Panama (INCAP) conducted a cluster randomized trial in four rural villages matched on population size and density in the Department of El Progreso, Guatemala from 1969 to 1977 (18). The INCAP Oriente Longitudinal Study in Guatemala is the longest running birth cohort in any LMIC (19, 20). Two villages were randomly assigned to receive an energy and protein drink, Atole. The other two villages were assigned a low energy drink with all energy derived from sugar, Fresco. Details of the supplementation and study, as well as characteristics of the 2392 cohort members (comprising all individuals ages 0 to 7 living in the villages at any point during the study period) have been described previously (19). The unit of observation in this study is the household in which a cohort member resided in each study wave. Alongside Guatemala as a whole, the study villages have undergone transformative social and economic changes in the last 50 years (21, 22). Literacy and schooling outcomes have improved over the study period, on par with national averages. Road and transportation have also improved steadily over time resulting in better access to non-agricultural jobs. Over the duration of the study, GDP per capita (USD, 2010$) for Guatemala has risen from $1692 in 1967 to $3160 in 2018 (23).
Data collection and variable specification
Durable Assets and Housing Characteristics
Information on durable assets which were contextually appropriate was collected from households with a cohort member residing in any of the four villages (as part of village censuses conducted) in the 1967, 1975, 1987 and 2002 study waves, and from households of all cohort members interviewed in the 2015-16 and 2017-18 study waves regardless of residential location (Supplementary Note 1). Depending on the age and sex of the participant, it could be their own house, parents’ house or marital house. Cohort members were born between 1962 and 1977 so that in 1975 those who had been born were between 0 and 13 years old and in 2017-18 between 40 and 57 years old. Individual items were queried until they became irrelevant or negligible in value (e.g. record player) and additional new items including computer, telephone (fixed or cell phone) and washing machine added as they became available (6). Only ownership of each item was collected and not information on the number, quality or technological generation.
Characteristics of the residence also were collected. These included ownership of house and land, number of rooms, material used for construction (floor, roof and wall), whether there was electricity, location of the kitchen, medium of cooking and type of sanitation, sewage and garbage disposal facilities. We categorized non-binary housing characteristics into low and high quality based on expert opinion. We created rooms per member, an indicator of crowding, such that a higher number reflects greater wealth. We assume no information bias from self-report of ownership and housing characteristics (24). Details are provided in Supplementary Table 1.
All participants gave written informed consent before participation. All methods were performed in accordance with the relevant guidelines and regulations.
Schooling
Attained years of schooling was collected for parents of cohort members. Attained schooling of cohort members was collected in adulthood during the 2015-16 and 2017-18 study waves.
Statistical Analysis
Sample and Changes in composition over survey waves
We compare early life characteristics (parental schooling, atole supplementation, year of birth and sex) of households of cohort members who resided in their original village in 1987 and 2002 versus households of cohort members who by then had migrated to other regions in Guatemala. We also compared those interviewed and not interviewed in recent waves (2015-16, 2017-18). We do not have information on the households of cohort members in 1987 or 2002 if they were not residing in their original study village at the time.
Construction of the harmonized asset index
For greater comparability to previously published work, we included ownership (yes/no) of radio, record player, sewing machine, refrigerator, television, bicycle, motorcycle and automobile. We included house ownership, land ownership, rooms per member, quality of housing construction (floor, roof, walls), whether the house had a separate kitchen, formal cooking medium, sanitary installation, improved water source and availability of electricity (17, 25). We imputed ownership of land, record player, sewing machine, television, motorcycle and automobile as zero for the 1967 wave when they were not asked. We imputed ownership of record player for 2002 and onwards as zero. We pooled all study waves (1967, 1975, 1987, 2002, 2015-16, 2017-18) into a single dataset for the main analyses. Since siblings were included in the original cohort in early waves, the number of households does not equal number of cohort members. The 2392 individuals recruited during the period 1969-77 come from 816 unique households. In the 2015-16 and 2017-18 waves, 176 and 240 individuals from the 1163 and 1265 who were followed up are married to each other. We therefore include household as the unit of observation.
Various approaches for constructing asset indices have been described in the literature, of which the most common is principal component analysis (PCA) (5). PCA is a statistical procedure that projects data points from the real number space onto a set of orthogonal ‘principal components’ such that the first component explains the maximum variance in the original data, and each subsequent component explains the maximum remaining variance. We performed PCA on a standardized correlation matrix created from the pooled dataset of binary variables comprising ownership of durable assets and housing characteristics. We retained the first component from the PCA as the harmonized asset index (2, 17). Some research has explored the potential of higher order principal components to explain other dimensions of wealth (such as agricultural wealth). Because these components are uncorrelated with the first principal component and in this context did not display interpretable loadings for housing characteristics, we did not consider them (5, 26).
We visually assessed the empirical distributions for clumping and truncation, examining histograms for each study wave (3). Clumping occurs when many households have the same value of index due to limited variation in ownership and housing characteristics. Truncation is the failure to differentiate between relatively low or high levels. Both of these phenomena are ideally resolved by including additional suitable assets or characteristics which could differentiate at points along the distribution of the index.
Usage of PCA with dichotomous variables has been criticized for violating assumptions of linearity and normality. Although it does not impose constraints on each variable, it assumes a multivariate normal distribution of the variables. Alternative procedures have their corresponding strengths and limitations. For instance, Multiple Correspondence Analysis (MCA), which is a suitable alternative to PCA for categorical data cannot be used with continuous data. Polychoric/tetrachoric PCA assumes bivariate normal distributions between latent variables which form the observed discrete variables. In practice, these methods tend to produce indices that are highly correlated (27). We assessed correlation of the harmonized asset index with cross-sectional schooling-related measures of SES among cohort members (parental schooling in 1967-75 and own attained schooling in 2015-16 or 2017-18).
Sensitivity Analysis
We also constructed cross-sectional indices (S1) with the same set of indicators used in the harmonized index, stratifying by region of residence (urban, rural) of cohort members in the final two waves. We also assessed the Spearman rank correlation of the harmonized index with a new index constructed by adding newer assets introduced in 2002 and later (S2; video player, sound system, computer, telephone, washing machine and sewage system) after imputing the newer assets as zero for earlier waves (1987 and before).
We report the Spearman rank correlations of the harmonized index with alternative indices to assess the sensitivity to dropping assets and study waves (S3), the structure (S4) of the correlation matrix (Pearson, polychoric) and the factor extraction method (PCA, Exploratory factor analysis, MCA), categorization of housing characteristics into ordinal (S5; low, medium, high). Exploratory factor analysis assumes an underlying factor which give rise to the observed distribution of assets and housing characteristics. MCA is a generalization of PCA when variables are categorical. We converted crowding into a binary variable for the MCA with values greater than 0.75 rooms per person set to 1 and otherwise 0. Additional information on the various sensitivity analyses is provided in Supplementary Note 2. We carried out our analysis using R 3.5.1 and tidyverse 1.3.0 (28, 29).