The Planetary Child Health and Enterics Observatory (Plan-EO): a Protocol for an Interdisciplinary Research Initiative and Web-Based Dashboard for Climate-Informed Mapping of Enteric Infectious Diseases and their Risk Factors and Interventions in Low-and Middle-Income Countries

Background: Diarrhea remains a leading cause of childhood illness throughout the world and is caused by various species of ecologically sensitive pathogens. The emerging Planetary Health movement emphasizes the interdependence of human health with natural systems, and much of its focus has been on infectious diseases and their interactions with environmental and human processes. Meanwhile, the era of big data has engendered a public appetite for interactive web-based dashboards for infectious diseases. However, enteric infectious diseases have been largely overlooked by these developments. Methods: The Planetary Child Health and Enterics Observatory (Plan-EO) is a new initiative that builds on existing partnerships between epidemiologists, climatologists, bioinformaticians, and hydrologists as well as investigators in numerous low-and middle-income countries. Its objective is to provide the research and stakeholder community with an evidence base for the geographical targeting of enteropathogen-specic child health interventions such as novel vaccines. The initiative will produce, curate, and disseminate spatial data products relating to the distribution of enteric pathogens and their environmental and sociodemographic determinants. Discussion: As climate change accelerates there is an urgent need for etiology-specic estimates of diarrheal disease burden at high spatiotemporal resolution. Plan-EO aims to address key challenges and knowledge gaps by making rigorously obtained, generalizable disease burden estimates freely available and accessible to the research and stakeholder communities. Pre-processed environmental and EO-derived spatial data products will be housed, continually updated, and made publicly available to the research and stakeholder communities both within the webpage itself and for download. These inputs can then be used to identify and target priority populations living in transmission hotspots and for decision-making, scenario-planning, and disease burden projection. Study registration: PROSPERO

child health interventions such as novel vaccines. Speci cally, it aims to apply a big data approach to the modeling of EIDs in combination with advanced geostatistical analyses and global Earth Observation (EO)-derived climate datasets, to produce generalizable estimates of the geographical distribution of these outcomes and of their associations with environmental drivers disseminated via an interactive webbased dashboard. Its underlying hypothesis is that the prevalence of many enteropathogens varies spatiotemporally as a function of climatic, environmental, and socio-demographic factors in a way that can be modelled using global EO datasets and similar products. It is hoped that this will enable the identi cation of target populations for interventions.

Methods And Design
Objective, scope, and data sources: Plan-EO's mission is to produce, curate, and disseminate spatial data products relating to the distribution of enteric pathogens and their environmental and sociodemographic determinants. Our approach is to compile, maintain and grow a large database of georeferenced results from studies that diagnosed EIDs in children in LMICs along with spatiotemporally matched covariates. We are sourcing and compiling a central repository of stool-level microdata collected at study sites in numerous LMICs that together represent the broadest and most representative range of currently available climate zones and environmental contexts. The rationale is that data from multiple sites and studies can offer insights into the general epidemiology of EIDs that might be biased by or not apparent from considering just a single location [33,34]. To draw broad, generalizable conclusions about the impact of the environment on enteropathogens, therefore requires combining data from locations that are representative of diverse ecological zones [35]. Plan-EO continues the activities of a previous project named Global Earth Observation for Monitoring Enteric Diseases (GEO-MED) which ran from 2018 to 2021, but with a new name and funding source, and the newly systematized methodology documented here. Several analyses were published under GEO-MED using the earlier version of this database [17,35,36], and Plan-EO was registered in PROSPERO on January 31st 2023 (CRD42023384709) as an update to GEO-MED with a revised methodology. All data extraction and analysis were paused between submitting the last GEO-MED manuscript for publication and registering Plan-EO in PROSPERO.
Through professional networks and exploratory literature reviews using online search engines and databases (such as PubMed, ResearchGate etc.), published studies will be identi ed that meet the following criteria: a). analyzed stool samples collected from children under 5 years of age; b). used PCR or equivalent molecular diagnostics to detect enteropathogens in samples (ensuring comparable sensitivity across studies and pathogens); c). were carried out in one or more LMICs (as de ned by the OECD [37]); d). recorded the dates of sample collection and approximate location of study subjects' residences (to enable spatiotemporal referencing). Priority will be given to studies with large numbers of samples, that diagnosed multiple enteropathogens of different taxa (viruses, bacteria, protozoa) in the same samples, and that took place in countries or contexts not yet represented in the database. An initial list of pathogens has been selected based on their being either highly endemic or responsible for high diarrheal disease morbidity in LMICs [38] as well as to be representative of the three major enteropathogen taxa. These include 5 enteric viruses -adenovirus, astrovirus, norovirus, rotavirus and sapovirus -3 bacteria -Campylobacter, ETEC and Shigella -and two protozoa -Cryptosporidium and Giardia. A saved search has been scheduled using the National Center for Biotechnology Information (NCBI) online tool so that newly published potential collaborating studies are summarized in automated monthly emails. Investigators on eligible studies will be contacted with a request for access to data from individual participants and, if they respond and agree, data use agreements (DUAs) will be established with the collaborating institution. Variables requested from contributing studies include: 1. Infection status for each pathogen diagnosed in each stool sample.
2. Date of sample collection.
3. Subjects' age on that date.
4. Whether the sample was collected during a diarrheal episode (e.g., cases) or while the subject was asymptomatic (controls).
5. Country, study, and site in which the subject was recruited and whether the study was health facilityor community-based.
. Geographic data. This may consist of household location coordinates where available, otherwise, subjects are georeferenced to the centroid of their neighborhood, village, or district or, where such information is unavailable, the geographical location of the health facility that recruited them. 7. Additional subject-level factors such as sex, anthropometric and feeding status and household, maternal and clinical information where available (see table 3). Figure 1 depicts the ow of data and processes within the Plan-EO project. Once a DUA is fully executed, study-speci c databases are securely transferred using a link to an encrypted, cloud-based folder, saved to a secure HIPAA-compliant server (A.), and subsequently deleted from the cloud. Study-speci c datasets are then processed and combined into a pooled central study database (B.) with a standardized format and list of variables in accordance with the PRIME-IPD tool for veri cation and standardization of study datasets retrieved for Independent Participant Data Meta-Analyses (IPD-MA) [39]. Sample data for which coordinates are unavailable are georeferenced by cross-referencing them with online mapping tools and other sources to obtain their latitude and longitude in decimal degrees. The original, study-speci c identi ers (IDs) are removed along with any HIPAA-classi ed IDs and each subject is instead assigned a unique ID that is speci c to this project and cannot be matched back to the original, study-speci c IDs. Pathogen positivity data are then linked with covariate variables, which fall into three main categories: a). Time-varying hydrometeorological variables: A set of historical daily EO-and model-based re-analysisderived estimates of hydrometeorological variables have been selected based on their demonstrated or hypothesized potential to in uence enteric pathogen transmission [35]. These will be extracted (F.) from version 2.1 of the Global Land Data Assimilation System (GLDAS -G.) [40], are summarized in Table 1 and an example visualized in gure 2a. Because of the lagged effect of weather on pathogen transmission, daily hydrometeorological variables will be aggregated over a lagged period of exposure, using methods described previously (averaged or summed over a 7-day lagged period of exposure from 3 to 9 days prior to the date of sample collection -t -9 to t -3 , where t 0 is the date of sample collection) [35].
This time window and lag period can be adjusted according to the incubation period of speci c pathogens.
b). Environmental spatial covariates: A set of time-static environmental and sociodemographic spatial covariates have been compiled in raster le format based on their hypothesized or demonstrated associations with diarrheal disease outcomes (E.) [41]. These are summarized in Table 2 and the example of enhanced vegetation index is visualized in gure 2b. Having georeferenced each sample to the approximate location of the subjects' residence, the variable values will be extracted at these coordinate locations using spatial analytical tools (F.). For samples georeferenced to health facilities, covariates will be averaged over a theoretical catchment area represented by a 20km buffer around the facility location using the ArcMap Zonal Statistics tool, otherwise they will be extracted to household or community coordinates using the Extract Values to Points tool [42].
c). Subject-and household-level covariates: Most eligible studies conduct baseline and/or follow-up assessments of information relevant to EID transmission risk and vulnerability. Examples are summarized in table 1. These data will be recoded to match as closely as possible standardly used variable de nitions, units, and categories. Where these are missing or not collected by some studies, values will be imputed or interpolated (C.) based on household survey data according to methods described previously [17]. Brie y, equivalent data are extracted from individual child-level microdata collected in Demographic and Health Surveys (DHS) [43], Multiple Indicator Cluster Surveys (MICS) [44], and some country-speci c surveys and combined into a parallel pooled survey database (D.) that is coded identically to the pooled study database. Survey data from the same survey strata (region and urban/rural status) in which the study sites were located are appended to the study database. Various methods can then be applied to interpolate or impute missing values based on this locally relevant information.
Tables 1 to 3 summarize the de nitions, units, categories, and sources of covariates in each of the three groups that are being compiled by Plan-EO.
Statistical methods: The resulting database will be in a su ciently exible format to which numerous statistical modeling approaches can be applied to address speci c research questions, make inferences about underlying biological processes and generate prediction maps to identify geographical foci of transmission risk. For example, in an earlier analysis of Shigella published under GEO-MED, generalized multivariable models were tted within a Bayesian framework to derive population-level conditional effects of the predictors [36]. An example of the resulting prediction maps is shown in Figure 2d. Similar approaches will be applied by Plan-EO, and effect estimates from model outputs can then be extrapolated to all unobserved locations within the target domain for which covariates raster values are available to make predictions (H.). Household-level variables, such as water supply, sanitation coverage, and women's education, have been geospatially mapped across LMICs by the Local Burden of Disease (LBD) project [45,46], and Plan-EO investigators are in the process of nalizing our own, improved estimates of these and others (such as housing material [see gure 2c], crowding and livestock ownership -I.).
Furthermore, subnational data on host-level factors such as breastfeeding and nutritional status, also determinative of pathogen infection risk, can also been sourced from LBD and household surveys [47][48][49]. By including model terms for symptom status (diarrheal or asymptomatic) and study type (health facility or community-based) it will be possible to make separate predictions for positivity in asymptomatic individuals, those experiencing a diarrheal episode and those seeking care for diarrhea. The models will be re-tted, and the results updated each time a new study database is added and can form the basis with which to estimate the potential in uence of climate change when combined with scenario projections such as those of the 6th Coupled Model Intercomparison Project (CMIP6) [50,51].
Dissemination and stakeholder engagement: Plan-EO will be established as an interinstitutional initiative consisting of two components: a). An interactive web-based dashboard: We will establish a data access and visualization system and suite of interactive maps to collate and disseminate the data products (comparable to WorldPop [52], the Malaria Atlas Project [29], or the DHS Program's Spatial Data Repository [32]). It will be built using an open-source platform and provide users with an interactive portal to explore the resulting pathogenspeci c risk maps (J.) and the pre-processed environmental and EO-derived spatial data outputs. This repository of products will be continually updated and made publicly available to the research and stakeholder communities both within the webpage itself and for download in commonly used raster formats such as TIFFs. Upon visiting the Plan-EO homepage, the user will be presented with a world mapbased interface and a series of drop-down menus with options to choose which pathogen to view and whether to view observed or predicted prevalence. The observed prevalence option will display pin icons at locations where the prevalence of the selected pathogen has been measured by a study, with colors corresponding to the type of study design and size proportional to the number of samples analyzed. By clicking on a pin, a smaller window will appear giving more information about the study site and with a hyperlink to the publication in PubMed as shown in the illustrative example in Figure 3a. The locations will be based solely on information reported in the publications (e.g., district centroids, named health facilities), including studies not yet included as microdata in the Plan-EO database, but identi ed by an ongoing systematic review (protocol in development and to be published separately) and will report only aggregated statistics with no subject-speci c information.
The predicted prevalence option will display the gridded model output surface as a map layer, as illustrated in gure 3b. The user will be able to zoom in and pan over the map and click on locations to obtain prediction values. As the project progresses, we will build a catalogue of layers, including predictions for each pathogen and the covariates being produced that can be superimposed on the map, toggled on and off, and downloaded as les, imported into a GIS, and used in further analyses by the end user.
b). An international consortium of investigators: A global network of collaborating researchers (with a majority being early-career and/or from LMICs) will be fostered and coordinated out of the Plan-EO headquarters at the University of Virginia (UVA). Investigators from contributing studies will be invited to join the Plan-EO network and their names, institutional a liations and contact information will be entered into a database. This will be used both to track the details of individuals to be included as co-authors on publications that rely on their data, and as a mailing list of contacts to whom emails will be sent periodically with updates regarding preliminary results, publications, new members, manuscripts for review etc.
Ethical considerations: All health information used in the Plan-EO project will be secondary data from studies and surveys that have already been carried out in LMICs by investigators at various institutions around the world and obtained informed consent for future use of health information from subjects' caregivers. All investigators with access to the main Plan-EO database will have completed certi cations in responsible human subject research. The original study-speci c databases will be securely deleted from Plan-EO servers when the project ends unless superseding DUAs are established. The project's data management and transfer plan has received ethical approval from the IRB of the UVA School of Medicine (IRB-HSR #220353), and the protocol has been registered as an IPD-MA in the PROSPERO prospective register of systematic reviews (CRD42023384709). All publications will follow the PRISMA-IPD [53] guidelines for IPD-MAs and the GATHER [54] guidelines for disease burden estimation. Average of daily means in the 7-day period from t -9 to t -3 days°C

Precipitation deviations
Deviations from the cumulative total volume over the 7-day period from t -9 to t -3 days mm Relative humidity Average of daily means in the 7-day period from t -9 to t -3 days % Soil moisture Average of daily means in the 7-day period from t -9 to t -3 days % Solar radiation Average of daily means in the 7-day period from t -9 to t -3 days W/m 2 Specific humidity Average of daily means in the 7-day period from t -9 to t -3 days g/kg

Surface pressure deviations
Deviations from the mean daily surface pressure in the 7-day period from t -9 to t -3 days mbar Surface runoff Cumulative total surface runoff over the 7-day period from t -9 to t -3 days mm Temperature Average of daily means in the 7-day period from t -9 to t -3 days°C Wind speed Average of daily means in the 7-day period from t -9 to t -3 days m/s   Sanitation facility Presence and type of sanitation facility used by the child's household [73] None (open defecation); Unimproved; Sewer or septic tank; Other improved [46] Discussion As climate change accelerates and as vaccine candidates for multiple EIDs come to market [23][24][25], there is an urgent need for etiology-speci c estimates of EID burden at high spatiotemporal resolution [27]. A 2015 analysis attempted to prioritize infectious diseases for mapping from a list of 176 based on a combination of public health burden, epidemiological characteristics, data availability and interest from the global health community [77]. Notably the study excluded from consideration several major highburden syndromes with diverse infectious etiologies such as diarrheal disease, meningitis, febrile illness and LRIs. Since then morbidity and mortality metrics for three such syndromes -diarrheal disease [41], LRIs [78] and febrile illness [79] -have been mapped either for Africa or for most LMICs using data on caregiver-reported symptoms from household surveys. However, attempts to map pathogen-speci c EIDs have hitherto either been limited to cholera [80], an outbreak-prone, reportable disease for which a large database was available, or rare uses of indirect methods that adjust all-cause diarrhea burden by pathogen-speci c attributable fractions at national-or province-level [27,81]. This neglect of enteropathogen-speci c infections is in part due to a perceived lack of readily accessible, spatially referenced data on their detection, since they are not routinely reported through health information systems or household surveys [82]. However, newly developed multiplex molecular diagnostic platforms, such as the Taqman Array Card (TAC), that can detect nucleic acid for a broad panel of microorganisms in a single biological specimen, are increasingly being used in surveillance studies to detect enteropathogens in stool samples [83][84][85]. Analyses that compare detections of multiple pathogen species in the same stool samples and study population have delivered numerous insights about the epidemiology and etiology of enteric disease over the past ve years [8,17,35,[86][87][88][89]. However, studies of this nature have diverse designs and research aims and, though several have been carried out in multiple sites and countries according to common protocols, no single study offers a broad enough range of geographical and environmental contexts. Furthermore, while many investigators are willing to grant access to microdata from their published studies to address novel research questions through IPD-MAs [17,33,35], it requires considerable dedicated effort to identify eligible studies and negotiate permissions and contractual agreements for data sharing [82].
Plan-EO aims to address these challenges and knowledge gaps by making rigorously obtained, generalizable disease burden estimates freely available and accessible to the research and stakeholder communities. Pre-processed environmental and EO-derived spatial data products will be housed, continually updated, and made publicly available to the research and stakeholder communities both within the webpage itself and for download. These inputs can then be used to identify and target priority populations living in transmission hotspots and for decision-making, scenario-planning, and disease burden projection, an evidence base that is urgently needed to underpin a proposed reorientation towards radical, transformative WASH [90] and Planetary Health [5] agendas. Its ndings also have the potential to generate novel hypotheses about the drivers of enteropathogen transmission, risk, and seasonality that can be further tested and ndings to be replicated in other settings. Results from pathogen-speci c infection risk models can be used to assess their relative sensitivity to changes in climate compared to other determinants such as sanitation improvements and to develop a scenario-based framework to support decision-making, resource allocation and identi cation of priority populations for targeting pathogen-speci c interventions such as novel vaccines. Availability of data and materials. Some of the datasets described in this manuscript are available in the GitHub repository https://github.com/joshcolston/Badr_Shigella_predictions. Other datasets to be generated will be made available as described in the methods section.
Competing interests. Data and process ow for the Plan-EO project