SM Net intervention during SIAs – Polio SIA operation in UP is almost uniform across the districts and includes two main types: (1) Fixed-site or booth-based vaccination; and (2) House-to-house vaccination. Generally, polio SIAs begin on a Sunday with fixed polio vaccination booths for one day. Then the house-to-house vaccination phase begins. In CMC areas, the SM Net functionaries engage communities before, during and after each polio SIA. Before each SIA, CMCs perform various awareness generation and trust-building activities such as the following: (a) interpersonal communication (one-to-one and one-to-group) with caregivers and family members of children eligible for of SIA vaccination; (b) meetings with local influencers; (c) children’s rallies. DMCs and BMCs help the government to prepare for SIAs by participating in development of micro-plans and ensuring the availability of necessary logistics and supplies. On the booth-day, the CMCs involve school children encouraging the community to bring the children less than five years old to booths for vaccination. During the house-to-house vaccination, CMCs accompany vaccinators who vaccinate eligible children. If the vaccination team encounters refusal, CMCs engage the help of local influencers to try to convince resistant families to allow their children to be vaccinated. After completion of an SIA, the SM Net functionaries visit all the houses with unvaccinated children and encourage them to go for polio vaccination in the upcoming/next SIA [8, 17, 18].
This study followed a quasi-experimental design that included time-series data with a non-equivalent comparison group. For this study, we defined CMC areas as ‘CLSM intervention areas’, and the areas without CMC deployment were considered as ‘Non-intervention areas’ of a block or polio-planning unit.
We carried out a secondary analysis of data routinely collected through the project Management Information System (MIS) of CGPP India (Refer to Weiss et al., 2011 Choudhary et al., 2019 for more details). The CGPP MIS provided information about various activities and results surrounding each polio SIA such as the following: (a) number of eligible children; (b) number of children vaccinated at SIA booths (i.e., fixed-site vaccination); (c) number of households visited by house-to-house vaccination teams; (d) number of households with all children in household vaccinated during the SIA; (e) number of households with at least one unvaccinated child; and, (f) number of households that refused vaccination, etc. We created a single database from separate data sheets (i.e., monthly progress reports of CGPP India MIS).
SIAs and analysis period
The study includes 52 polio SIAs held from January 2008 to September 2017 in 56 blocks/polio planning units, covering 12 districts of Uttar Pradesh. For the purpose of this study, ‘January 2008’ is considered as the starting point. Although data for earlier SIAs were available in the CGPP India MIS, prior to January 2008, the CGPP’s reporting laid more emphasis on qualitative rather than quantitative data. We selected ‘September 2017’ as the endpoint of the study because after this time, CGPP India altered its approach in some of the areas by introducing a low-intensity SM Net intervention without CMCs, intervening only through block-level functionaries. In September 2017 (the endpoint of the study period), the CGPP had 1100 CMCs, deployed in 823 villages/urban wards from UP, reaching 522,000 households. Most of these CMC areas had a significantly high proportion (68%) of Muslims and a low level of female literacy (45%), compared to non-intervention areas.
We have not performed any sampling or sample size determination procedure and included all the 56 geographic areas (i.e., blocks/polio-planning units) where CGPP had its CLSM intervention during the study period (i.e., from January 2008 to September 2017). Similarly, we included all the 52 SIAs with a complete operation (booth-based and house-to-house vaccination) and covered all the geographic areas. Both the study areas (i.e., intervention and non-intervention areas) had the same number of polio SIAs (77) and an equal number of SIAs (52) are included in this study. Note also that 25 SIAs held during the study period (i.e., from January 2008 to September 2017) were excluded from the analysis because these SIAs had either partial operations (i.e., the SIAs that included only one of the two main types of operations) or incomplete geographic coverage (i.e., the SIAs that did not cover all study units and limited to selected areas). Also, we excluded two CGPP blocks that were not covered at the start of the study period (Appendix Table 1).
Considering the date of 25th February 2012, when India became a polio-non-endemic county , we divided the SIAs of the entire study period into the following two periods: (1) polio-endemic period and (2) post-polio-endemic period. The 25 SIAs that took place before March 2012 are labeled as ‘Polio-endemic period SIAs’, whereas the ‘Post-polio-endemic period’ included 27 SIAs held from March 2012 to September 2017.
Using the CGPP MIS data, we computed various indicator variables to quantify the performance of both the fixed-site and house-to-house vaccination activities of polio SIAs as well as a Community Engagement Index, separately for the intervention and non-intervention areas. We considered the following nine indicators as dependent variables for cross-temporal analysis:
(1) Overall campaign coverage or SIA coverage — This is the percentage of eligible children vaccinated (through polio booths and house-to-house activities) during an SIA. A total number of eligible children (i.e., number of children vaccinated in the previous SIA) is the denominator of this indicator.
(2) Booth coverage — The percentage of eligible children vaccinated at the polio SIA booths (fixed site vaccination). The denominator of this indicator includes the total number of children vaccinated in the previous polio SIA.
(3) Rate of ‘X’ houses generated at the beginning of an SIA — The percentage of ‘X’ houses (i.e., the households where an unvaccinated child is present or the vaccinators do not know the SIA vaccination status of all children) generated at the beginning of house-to-house activity of an SIA. The total number of houses visited by house-to-house vaccination teams of an SIA is the denominator of this indicator. Whereas, the numerator includes the ‘number of “X” houses marked at the beginning phase (i.e., the first visit that usually happens on Day 2 of an SIA) of house-to-house vaccination activity.
(4) X-to-P conversion rate of an SIA — The percentage of ‘X’ houses converted to ‘P’ (i.e., houses with all vaccinated children or absence of any eligible child for polio SIA vaccination) during a polio SIA. Total number of ‘X’ houses generated at the beginning phase of a house-to-house activity of an SIA is the denominator of this indicator.
(5) Rate of remaining ‘X’ houses at the end of an SIA — This is a percentage of remaining ‘X’ houses at the end of the house-to-house activities of an SIA. The total number of houses visited by house-to-house vaccination teams of an SIA is the denominator of this indicator.
(6) Refusal rate at the start of house-to-house vaccination of an SIA – This is the number of households who refused polio vaccination at the beginning phase of a house-to-house SIA activity (Marked as ‘XR houses’ in the vaccinators’ tally sheet) against every 10,000 households visited by house-to-house vaccination teams.
(7) Refusal-to-Acceptor conversion rate —The percentage of resistant houses converted to acceptors during the house-to-house activities of an SIA. A total number of refusal houses generated at the beginning phase of the house-to-house vaccination activity denominates this indicator.
(8) Refusal rate at the end of an SIA – This is the number of households who refused polio vaccination at the end of an SIA house-to-house activity (Marked as remaining ‘XR houses’ in the vaccinators’ tally sheet) against every 10,000 households visited by house-to-house vaccination teams.
(9) Community Engagement Index (CEI) of polio SIA — A composite indicator computed based on five selected indicators reflecting community engagement in polio SIAs (Refer to Choudhary, et al., forthcoming for computation details). The CEI reflects the overall level of community engagement in the polio SIAs and its values ranged from 0 to 1 (or 0 to 100%). The zero CEI value indicates no engagement of communities.
Exploratory analysis and data cleaning
We carried out frequency analysis and box-plot analysis using MS Excel, SPSS and Tableau Desktop (public) Visualization software to identify the data with unexpected values (including typographical errors and outliers) for each indicator. We used the ‘Z score’ and box plots to check the outliers in the dataset (values less than − 2.68 or greater than 2.68). The data with extreme values were verified with the quarterly or annual narrative reports of the CGPP India and unjustified outliers were replaced with the average values for all the study variables. Also, we performed a graphical analysis to observe trends and variations between the intervention and non-intervention areas.
Generalized Estimating Equations (GEE) analysis
We used GEE-based analysis in STATA to assess the post-polio-endemic period difference in the nine above mentioned indicators of intervention and non-intervention areas. Similar to previous studies of Weiss et al. (2011) and Choudhary et al. , we performed GEE analysis that accounted for the longitudinal/panel nature of the data, including block/polio planning area level Intra-cluster correlation (ICC). We preferred ‘Quasi-likelihood under the independence model criterion (QIC)’ as the model selection method . The GEE model with the lowest QIC was considered as the most appropriate one among the other competing models with different correlation structures (e.g., exchangeable, auto-regressive, unstructured etc.). In the GEE analysis, we assumed that the differences between the outcome indicators of intervention and non-intervention areas might vary by district, place of residence and time of year (quarter). We also assumed that there might be an interaction between the differences by intervention status (i.e., CLSM intervention/No intervention) and study district. That is, we expected the possibility that the effect of CMC activities in intervention areas, as compared to non-intervention areas, may get modified depending on the district being analyzed. The multivariate statistical analysis included independent variables: study district, place of residence of block/planning unit and time of year (and interaction terms if significant) along with the intervention status. Whereas, the bivariate analysis was limited to a comparison of indicators between intervention and non-intervention areas.
Further, we followed the recommendations of Bouttell et al.  and performed the different sensitivity analyses given below to assess the treatment effects of CLSM intervention on the SIA outcomes.
Interrupted time-series analysis (ITSA)
ITSA analysis was performed to assess the extent of change (per SIA) and trend in the studied indicators. We used the ‘itsa’ command in STATA and followed the guidelines of Linden . The presence of autocorrelation in the data was checked through the ‘actest’ command. If no autocorrelation was present for more than one leg, then the default model with the ‘Newey’ option was selected. Otherwise, the itsa model included the ‘Prais’ option that adjusted the autocorrelation in the data. We performed 56 independent tests to assess the baseline comparability between the intervention area and each of the non-intervention areas. The final itsa model included the selected non-intervention areas with a p-value greater than 0.10 on both mean baseline difference (z) and mean baseline slope (z_t).
Difference-in-Differences (DID) analysis
Similar to our earlier analysis , we compared the differences between the outcomes of the polio-endemic and post-polio-endemic period. We used the ‘diff’ command in STATA, developed by Villa  and estimated Difference-in-Differences (DID) treatment effects, using unadjusted, adjusted and kernel PSM methods. Background characteristics that significantly differed between the intervention and non-intervention areas were considered as covariates in the adjusted and Kernel PSM based DID analysis. For the adjusted and kernel PSM based-analysis, we followed the recommendation of Oakes et al.  and included the covariates (independent variables) in the model, which predicts the exposure (to intervention) and not the outcome variable. The covariates were not identified through step-wise regression procedures or related techniques. Our possible covariates included selected characteristics of intervention and non-intervention area, i.e., level of urbanization, female literacy rate, percent Hindu/Muslim population, and average household size (Total individuals in a household). A preliminary list of covariates to screen through further testing was identified through the t-test, using a 0.05 level of precision. We also performed balancing tests, using the ‘pstest’ command in STATA to check for covariate balance after the matching (for Kernel PSM based analysis).
Synthetic Control Method (SCM) based analysis
The SCM was applied to estimate the treatment effects based on an aggregated (weighted average) estimate of a combination of non-intervention areas that were similar to the intervention areas. Since the synth analysis in STATA allows only a single unit as an intervention , data from the entire CGPP intervention areas (i.e., CMC areas of all 56 blocks) were merged into a single unit and treated as ‘Intervention area’. In contrast, data from non-intervention areas of 56 blocks were treated as individual units of the Donor pool. We followed the analysis approaches and steps recommended in the literature related to the SCM [20, 25–27]. We used the ‘synth’ and ‘synth_runner’ package in STATA to construct synthetic CMC areas and perform placebo tests for evaluating the significance of estimates. A synthetic intervention area was constructed based on the following three characteristics of both the intervention and non-intervention areas: percent urban population, female literacy rate, percent Hindu population. We used Root Mean Squared Predication error (RMSPE) of intervention areas to assess the goodness of fit and selected the SCM model with the lowest polio-endemic period RMSPE value. Ratios between the post-polio-endemic period RMSPE and the polio-endemic period RMSPE was used to find out the ill-fitting placebo runs. The non-intervention areas with the RMSPE ratio higher than the intervention areas were excluded from the final analysis.
Then we estimated the counterfactuals (i.e., the outcomes in the absence of CLSM intervention) through the following formula recommended for assessing a causal effect of an intervention or a program .
Δ = (Y | P = 1) - (Y | P = 0)
Where Δ denotes the causal effect of CLSM intervention (P) on an outcome (Y). (Y | P = 1) denotes an outcome with CLSM intervention and (Y | P = 0) to an outcome without the CLSM intervention (i.e., a counterfactual). Since the causal effect (Δ) and outcomes from intervention areas (Y | P = 1) were already assessed through other methods (defined earlier), we altered the positions of the formula elements in the following manner and assessed the counterfactuals.
(Y | P = 0) = (Y | P = 1) - Δ
Lastly, we attempted to estimate the population (i.e., number of households or under-five children) influenced by the CLSM intervention by multiplying the treatment effects with the actual population.