Handling missing data in a composite outcome with partially observed components: Application in clustered paediatric routine data.

Background: In health care settings, composite measures are used to combine information from multiple quality of care measures into a single summary score. Composite scores provide global insights and trends about complex and multidimensional quality of care processes. However, missing data in subcomponents may hinder the overall reliability of the composite measures in subsequent analysis and inferences. In this study we demonstrate strategies for handling missing data in Paediatric Admission Quality of Care (PAQC) score, a composite outcome which summarizes quality of inpatient paediatric care in low income settings. Methods: We analysed routine data collected in a cluster randomized trial in 12 Kenyan hospitals. Multilevel multiple imputation (MI) within joint model framework was used to fill-in missing values in selected PAQC score subcomponents and partially observed covariates across two levels of hierarchy. We used proportional odds random intercepts and generalized estimating equations (GEE) models to analyse PAQC score before and after multiple imputation. Using a set of simulations scenario, that is, varied proportions of missingness in PAQC score subcomponents of interest under missing at random and missing completely at random mechanisms respectively, we compared the magnitude of bias in parameter estimates obtained under MI and the conventional method in addressing missing data in PAQC score components. Under the conventional method we scored all missing PAQC score components with value 0. Results: Results from observed data showed that multiple imputation of both PAQC score components and covariates yielded more accurate and precise estimates compared to complete case analysis. From the simulation study, the conventional missing data method led to significantly larger biases in estimated proportional log odds of the outcome compared to MI methods. The amount of bias increased with increase in rate of and therefore recommend multiple at the and composite


Introduction
Composite measures combine information from multiple measures into a single summary score [1][2][3][4][5][6]. In health care settings, composite measures have been used as scorecards to measure and benchmark performance and quality of care in neonates [7] and cardiovascular care among adults, [5,6,8,9] score; a 7-point composite score aimed at benchmarking processes of care among children admitted with common childhood illnesses in low income settings [10]. In the validation study, PAQC score was shown to be a good proxy for outcome of care [11]. Besides gain in statistical efficiency, composite scores reduce the amount of data processed thus providing global insights and trends about complex and multidimensional quality of care processes [2,3,9]. In addition, the issue of multiple testing is avoided [12,13]. Although composite outcomes complement single measures, weak theoretical and statistical assumptions may undermine the overall reliability [6,7]. For instance, use of inappropriate methods to deal with partially observed subcomponents may impede the validity and reliability of the composite measure in subsequent analyses and inferences [2,4,6,9,14]. In literature, multiple imputation (MI), proposed by Rubin [15], offers a good, often best practice, solution in dealing with partially observed outcomes and covariates [16][17][18][19]. In particular, handling missing data in single outcomes (with no subcomponents) is straight forward because the imputation model is usually equivalent to the analyst's model [20]. On the other hand, dealing with missing data in composite outcome context has not received the same level of attention with no consensus on whether to impute at the composite score level or at the missing components level [4,21].
In previous analysis of PAQC score, partially observed subcomponents were scored with value 0 representing suboptimal care [10]. This approach of dealing with missing PAQC score components is henceforth referred to as "conventional method",, which will be deemed equivalent to single imputation. A major limitation of the single imputation method is inability to capture uncertainty in the missing data values leading to under estimated standard errors [18,22]. Using routine paediatric pneumonia data from Kenyan hospitals, we first aim to explore the missing data mechanisms underlying the missing PAQC score components. Depending on the plausible mechanisms, we will explore appropriate strategies of dealing with missing data in the PAQC score components.
In addition, missing data in predictor variables will be addressed, properly accounting for the hierarchical structure of the data (i.e., patients clustered within clinicians who are then nested within hospitals). Through a range of simulation conditions, that is, three missing data rates under two missing data mechanisms, we will assess the implications of the missing data method (MI versus the conventional method) employed in addressing missing PAQC score subcomponents. Specifically, the amount of bias in regression coefficients and corresponding standard errors attributable to missing PAQC score components across the simulation scenarios will be obtained and compared between MI and the conventional method.

Methods
Case study design Data analysed here came from a cluster randomized trial conducted in 12 Kenyan countylevel hospitals participating in the Clinical Information Network (CIN), an ongoing observational study in routine inpatient paediatric care [23,24].
The trial objective was to investigate uptake of paediatric pneumonia treatment guidelines as recommended by the World Health Organization (WHO) in 2013 [25]. Detailed on the trial are contained in the trial report [26]. In brief, hospitals were randomly allocated to receive enhanced (6 hospitals) or standard (6 hospitals) audit and feedback. Enhanced audit and feedback constituted a monthly enhanced audit and feedback report on assessment, classification and treatment of pneumonia cases, a bi-monthly standard audit and feedback report on general inpatient paediatric routine care and network intervention strategies [26,27]. Standard audit and feedback on the other hand constituted a standard audit and feedback report alone and network intervention strategy [26,27]. Overall, 2299 children aged 2 to 59 months were admitted with childhood pneumonia in 12 study hospitals between March 2016 to November 2016. Data were abstracted by trained data clerks from individual patient medical records after discharge from hospital. The data were entered into an open source data capture tool (Research Electronic Data Capture, (REDcap)) [28] using standard operating procedure manual. For each case record, details of the admitting clinician including a unique clinician code, sex and cadre ("cadre" refers to clinician's qualification depending on the level of training, that is, clinical officers for a clinician with diploma-level training and medical officers for clinician with a bachelor's degree level training) were also abstracted into a separate database. Patients' and clinicians' databases were linked by unique clinician code. Amongst patients with an oral amoxicillin prescription, dose, frequency of administration and patient's weight necessary for calculation of dosage per kilo body weight were missing in 0.4%, 2.6% and 2.9% case records respectively (Table 1).
Insert Table 1 Missing data mechanism underlying pneumonia trial data Among pneumonia quality of care processes, undocumented (missing) primary and secondary pneumonia signs and symptoms (assessment domain) and missing amoxicillin prescription (treatment domain) were considered as inappropriate pneumonia care and were therefore not of interest in this analysis. However, missing data in amoxicillin dose, frequency of administration and weight of patient among oral amoxicillin recipients (treatment domain) (Table 1) were of interest hence explored further. This was in addition to missing level 1 covariate (patient's sex) and level 2 covariates (clinician's sex and cadre). To investigate plausible missing data mechanisms underlying each partially observed variable of interest, we created a binary missing data indicator and regressed it on fully observed variables. The predictor variables of interest included an interaction between intervention arm and follow up time in months, number of comorbid illnesses, age of patient, hospital malaria prevalence and paediatric admission workload. We also included observed assessment and diagnosis domain components as predictors. When the probability of missing values in a variable was independent of the variable itself or any other observed variable in the data set [32], then we assumed a Missing completely at random (MCAR) mechanism, although an MNAR mechanisms was theoretically still plausible. On the other hand, if the probability of missing values in a variable did not depend on the variable of interest but was conditionally dependent on other observed variables in the data set, we assumed a Missing at random (MAR) mechanism [32]. Table A1) suggested that for clinician's cadre and sex, missingness was dependent on some fully observed variables hence missing at random (MAR). Likewise, the probability of missing amoxicillin dose and frequency of administration and patients' weight were dependent on observed variables (supplementary Table A1).

Preliminary results (Supplementary
Multilevel multiple imputation of missing covariates and PAQC score components Multiple imputation (MI), proposed by Rubin [15] is often the recommended method for obtaining valid parameter estimates from partially observed data [16][17][18]. MI generally relies on the MAR assumption and it involves three sequential steps: imputation, analysis, and pooling of parameter estimates. In the first step, independent random samples are drawn from the posterior predictive distribution of the missing values given the observed data and a statistical imputation model, thus generating more than one filled-in data sets. Imputed datasets are then analysed using standard statistical methods in step 2. In step 3 final estimates are obtained by averaging over the parameter estimates from all multiply imputed data sets according to Rubin's Rule [15]. In this study, partially observed patients' (level 1) and clinicians' (level 2) variables were imputed within the joint model framework implemented in jomo [33] and mitml [34]  PAQC score after multiple imputation of missing treatment domain components After imputing missing amoxicillin dose, amoxicillin frequency and patient's weight, we constructed PAQC score in each imputed data set in a two-step procedure. First, we created binary indicators with 1 representing adherence to recommended childhood pneumonia guidelines and 0 representing inappropriate care. Specifically, the value zero in three assessment domain constituents corresponded to: -i) lack of documentation of at least of one of the primary signs and symptoms required for pneumonia identification; ii) lack of documentation of at least one of the 7 secondary signs and symptoms required for pneumonia severity classification; iii) incomplete documentation of all primary and secondary pneumonia signs and symptoms (Table 1). For the second PAQC score domain, we created a binary indicator with 1 representing pneumonia diagnosis and classification and 0 for any other classification (e.g. severe pneumonia). The third PAQC score domain (treatment) comprised 2 components; a binary indicator with 1 corresponding to oral amoxicillin prescription and 0 representing inappropriate care either due to missing prescription or documentation in the case record that oral amoxicillin was not prescribed.
Among patients prescribed oral amoxicillin, we created a new variable "recommended dose per kilo body" after MI of missing amoxicillin dose, amoxicillin frequency and patient's weight. We then transformed the new variable into binary form with 1 representing recommended oral amoxicillin dose (i.e., dose between 32 and 48 international units (IU) per Kilogram (Kg) every 12 hours) and 0 representing either wrong dosage (under dose for oral amoxicillin < 32 IU/Kg or over dose for oral amoxicillin >48IU/Kg), wrong frequency of drug administration (e.g. administration frequency of once every 24 hours instead of once every 12 hours) or both. In the second and final step of PAQC score construction, we summed all the 6 binary indicators spanning assessment (n = 3), clinical diagnosis (n = 1) and treatment (n = 2) domains to obtain PAQC score. After MI variation in PAQC score on the 7-point scale was attributed to inappropriate inpatient pneumonia care. That is, undocumented primary and secondary signs and symptoms (assessment domain), misclassification of disease severity, failure to prescribe the oral amoxicillin drug or prescription of the drug in the wrong dose or frequency [10].
PAQC score under conventional approach Under the conventional approach, construction of pneumonia PAQC score proceeded as above except that missing amoxicillin dose, amoxicillin frequency, patients weight among oral amoxicillin recipients in treatment domain were not imputed. Instead, we scored them with value 0 in the binary indicators. As a result, variation in PAQC score on the 7-point scale was due to missing data and/or inappropriate care across the three domains of care.
That is, missing data with reference to missing dose per kilo body weight and or frequency among oral amoxicillin recipients. Inappropriate care with reference to undocumented primary and secondary signs and symptoms in the assessment domain, incorrect severity classification, undocumented oral amoxicillin prescription or prescription of the drug in the wrong dose or frequency [10].

Statistical analysis
When the responses are ordered and the proportional odds assumptions are upheld (parallel logits) the cumulative logits (proportional odds) model is considered [36]. The proportional odds model expresses the k-category ordered outcome in terms of k-1 cumulative logits and estimated covariates effects are assumed common across all k-1 cumulative logits [16,36] When inference at population level is of interest, a marginal model such as the generalized estimating equation (GEE) model is used [37]. On the other hand, random effects models are useful when interest lies in drawing subject-specific inferences [16,38]. We used both proportional odds random effects and GEE model families in order to assess stability of parameter before and after MI. Letting i index patient, j clinician and l hospital, the random intercepts model implementedin R's Ordinal package [39] corresponded to [Due to technical limitations, this equation is only available as a download in the supplemental files section.] (2) where k , k = 1,2,3,4,5,6 are PAQC score specific intercepts, are estimated regression coefficients and b j,i are clinician's random intercept. In this analysis, we did not include hospital random effects due to few number of clusters. Similarly, letting i index patient, j clinician and l hospital, the GEE model of interest implemented in R's Multigee package [40]  where k , k = 1,2,3,4,5,6 are PAQC score intercepts. In this analysis, we adopted an independent working correlation. Both model families were used to analysis data under complete case analysis and after MI of missing covariates and PAQC score components in the treatment domain. Under complete case analysis, records with missing covariates were discarded while partially observed outcome subcomponents were handled using the conventional approach described in section above. A 5% level of significance was considered in all statistical analyses.

Simulation study
In further analysis, we conducted a simulation study with an objective of examining and comparing bias in parameter estimates (both regression coefficients and standard errors) under multiple imputation and the conventional approach of handling missing data in structure of the pneumonia trial data (i.e. mixed variable types of PAQC score components) and the multilevel structure, creating a standard data set based on model parameters while preserving the correlation structure was a challenge. As an alternative, we opted to generate missing data in a complete subset of the pneumonia trial data.
However, a limitation of this strategy is that only 16.7% (357/2127) case records in the pneumonia trial data were fully observed with the rest of the observations having missing data either in the covariates or PAQC score components. Using 16.7% of the observed data would affect precision with which parameter were estimated with in subsequent analyses [16,41]. Consequently, we decided to create a subset of the pneumonia data set complete in PAQC subcomponents of interest, that is, amoxicillin dose, frequency and patient's weight amongst oral amoxicillin recipients. To achieve this, we excluded 65/2127 (3.1%) case records with missing oral amoxicillin prescription. Of the remaining 2062 (96.9%) pneumonia case records, 1036 (50.2%) were prescribed oral amoxicillin while 1026 (49.8%) pneumonia cases were not. Amongst patients prescribed oral amoxicillin, we further excluded 61/1036 (5.9%) cases for whom weight (n = 30), amoxicillin dose (n = 4) or frequency of amoxicillin administration (n = 27) were missing. Therefore, the standard data set we used in the simulation study was a subset of pneumonia trial data and it consisted of 200194.1%) observations after exclusions above. Although our standard data set was complete in the target PAQC score components, we still had missing values in patient's sex, clinician's sex and cadre (covariates) as well as assessment domain components. Undocumented signs and symptoms in the assessment domain components were deemed as inappropriate care and were scored 0 in the binary indicators in the PAQC construction stage. To obtain the standard estimates (regression coefficients and standard errors), we multiply imputed missing covariates in the standard data set 10 times with joint model approach. We fitted a random clinician's intercepts model to each imputed data set and pooled the final parameter estimates according to Rubin's Rule. The pooled (standard) estimates were used as reference estimates against which results from different simulation scenarios were benchmarked.

Simulation scheme
First, we generated missing data in the treatment domain subcomponents (patient's weight, amoxicillin dose and frequency) of the standard data assuming missing completely at random (MCAR) and missing at random (MAR) mechanisms respectively. Binary missing data indicators were generated by sampling random numbers from a random binomial distribution with success probability rates of 3%, 10% and 40%. A 3% missing data rate was selected to mimic the rate of missingness observed in the pneumonia trial data before exclusions (Table 1), while 10% and 40% were chosen to assess the extent of bias in moderate to high rates of missingness. Under the MCAR mechanism, missing values were imposed on treatment domain subcomponents independent of missing and observed PAQC subcomponents (i.e., assessment and clinical diagnosis domains) and covariates (i.e., hospital, clinician and patient characteristics). For the MAR condition, probabilities of missing data were conditionally dependent on fully observed variables associated with probability of missingness in the three variables of interest (based on the real trial dataset) (Supplementary Table 1). In both MAR and MCAR, missing data in weight, oral amoxicillin dose, and frequency of administration were induced independently of each other, such that either one, two or all three variables were missing for any given patient.
Overall, we considered 6 simulation scenarios (i.e., two missing data mechanisms by 3 missingness rates). Each scenario was simulated 1,000 times. We chose and maintained random number generators (seeds) for different scenarios to ensure reproducibility of results.
For each scenario, we used two approaches to handle missing data generated in PAQC score subcomponents of interest. In one approach, proposed MI was used to fill-in missing amoxicillin dose and frequency of administration and patient's weight (pneumonia PAQC score subcomponents) in addition to patient's sex (level 1 covariate) and clinician's sex and cadre (level 2 covariates). In the second approach, we only imputed missing  6) Similarly, we assessed bias and accuracy of the corresponding standard errors. In this simulation study coverage probability of the 95% confidence intervals were not applicable because missing data were simulated on the same subset of the pneumonia trial dataset.
All simulations were conducted in R version 3.5.4 Results: Case Study Insert Table 2 In Table 2 we present random intercepts and GEE models parameter estimates (in log odds) and the corresponding standard errors obtained under complete case analysis (after deletion of case records with missing clinician's cadre, sex and patient's sex) combined with conventional approach of handling missing PAQC score elements. We also present estimates after multiple imputation missing covariates and missing PAQC score subcomponents in the treatment domain. In both model families, we observed change in the regression coefficients before and after imputing missing covariates and PAQC score subcomponents in the treatments domain. However, the magnitude of change varied across covariates of interest. The largest differences in proportional logs odds were observed in hospital workload regression coefficient with an approximate absolute difference of 0.25 (i.e., from -0.08 to -0.33) in the random effects model and an absolute difference of 0.15 (i.e., from -0.22 to -0.37) in the GEE (Table 2). We further observed model specific shifts in the direction of effect before and after MI. For example, under the random effects model, we observed negative patient's sex effect (log odds = -0.03) under complete case analysis which changed to a positive effect (log odds = 0.01) after MI (Table 1).
With regard to standard errors, MI led to more precise estimates across all variables compared to complete case records methods in both GEE and random intercepts models.
Moreover, we observed changes in statistical significance for some variables after multiple imputation. For instance, hospitals workload effect was not significant at 5% level of significance under complete cases analysis but turned out significant (p-value<0.05) after multiple imputation. These results were observed in both random effects and GEE models (Table 2). Considering the random effects model, deleting case records with missing data led to inflated variance between clinicians compared to variance estimated under MI (Table 1). This could be explained by the fact that clinicians with missing cadre and sex were discarded under complete case analysis resulting to fewer number of clinicians (clusters) hence increased clinicians' uncertainty. On the other hand, all clinicians were retained after MI, hence leading to more precise estimates.

Results Simulation study
Insert Table 3 Insert Table 4 Estimated bias in regression coefficients and standard errors across 6 simulation scenarios (missing data rates and missing data mechanisms) under the conventional and proposed multiple imputation methods are presented in Figures 1 and 2 respectively. In Tables 3 and 4, we present bias, empirical standard errors, model based standard errors and MSE for the regression coefficients and standard errors respectively estimated under the MAR mechanism across three missing data rates. Other simulation results under MAR and MCAR are provided in supplementary Tables A2-A7. From these results, the conventional approach of handling missing PAQC score subcomponents in treatment domain led to biased regression coefficient estimates and the magnitude of bias varied across variables (Table 4 and Supplementary Table A3 and Figure 1 and Figure 2). Additionally, bias in regression coefficients increased with an increase in the proportion of missingness and was somewhat larger when missingness in PAQC score components was generated under MAR mechanism than under MCAR mechanism.
With multiple imputation of missing treatment domain components, we observed smaller magnitude of bias in regression coefficients (Tables 3 and supplementary Table A2). This implied that the parameter estimates averaged over 1000 simulations were closer to the standard data estimates across the missing data rates and missing data mechanisms under consideration (MCAR and MAR).
With regard to standard error, the conventional approach of handling missing PAQC score components in the treatment domain led to larger bias across all variables of interest ( Figure 2 and supplementary tables A6 and A7) compared to proposed MI methods (supplementary tables A4 and A5). As expected, under the MAR mechanism all methods consistently exhibited larger bias compared to MCAR mechanism across the missing data rates.
With both conventional and MI approaches, the regression coefficients were either underestimated (negative bias) or overestimated (positive bias), the standard errors across simulation scenarios tended to overestimate the true standard errors thus resulting in positive bias, reflecting the loss of information due to missing data. However, the regression coefficients were more prone to bias across individual variables (Figure 1) compared to standard errors ( Figure 2). A possible explanation is that case records with missing PAQC score subcomponents were not discarded and analyses were based on all observations in the standard data set regardless of the approach used hence no major impact on the precision with which the standard errors were estimated across the simulation scenarios. Across simulation scenarios, the estimated empirical standard errors were close to the average of the estimated within simulation SE (model based standard errors). The magnitude of both estimates tended to increase with an increase in the proportion of missing data in PAQC score components. Across the simulation scenarios, MSEs were slightly larger under the conventional method compared to MI approach.
Additionally, the MSEs were somewhat larger under MAR mechanism compared with MCAR mechanism.

Discussion
In this study we sought to analyse clustered data with covariate missingness across two levels of a multilevel structure. This was in addition to proposing appropriate strategy for handling missing data in a composite outcome and assessing how the proposed method performs in a simulation study in comparison to simple conventional approach. The composite outcome of interest was PAQC score [10] adjusted to the new pneumonia guidelines, among children aged 2 to 59 months (pocket book) admitted in 12 Kenyan hospitals during a cluster randomized trial. Individual components in PAQC score corresponded to care processes spanning assessment, diagnosis and treatment of inpatient paediatric pneumonia cases. From preliminary analysis, missing data in pneumonia care processes varied within and between PAQC score domains. The rate of missingness could be explained by complexity underlying individual tasks [44]. That is, care processes (PAQC subcomponents) perceived to require more cognitive effort (e.g. assessment and documentation of respiratory rate or oxygen saturation measurement) generally recorded higher rates of missingness compared to tasks that required less effort to perform (e.g. assessment and documentation of difficulty in breathing or cough history from the care giver). In literature, handling missing data in composite components remains a major obstacle in epidemiological studies reporting composite measures [1,6,21]. Previously, a systematic review reported that only 2 in 40 (5%) clinical studies adequately and appropriately handled missing data in their composite outcomes [1]. In most studies, researchers avoid missing items in composite scores by conducting complete case analysis [21].
In the construction of PAQC score, missing components do not cause the whole scale score to be missing. Instead, subcomponents with missing values (e.g. undocumented signs and symptoms) and those corresponding to inappropriate care at patient level (e.g., overdose or under dose treatment prescription) are penalized with zero in the binary indicators [10,11]. This leads to low scores on the 7-point ordinal scale. Another limitation of the conventional approach is that it consists in drawing a single imputation, a missing data method which is known to underestimate variability [18].
In our proposed strategy for handling missing data in pneumonia PAQC score, we coupled MI (a statistical solution) and expert opinion to handle missing data in PAQC score adjusted to new pneumonia guidelines. Specifically, 3 partially observed treatment domain components (oral amoxicillin dose, frequency of administration and patient weight) were multiply imputed taking into account underlying mechanism while documents pneumonia sign and symptoms in the assessment domain and missing oral amoxicillin prescription were treated as inappropriate care hence score 0 in line with the conventional approach.
Our decision to treat some PAQC score subcomponents as inappropriate care was based by the study design in addition to expert opinion. By study design, the inclusion criterion was children admitted with pneumonia signs and symptoms (paediatric basic protocol).
Therefore, undocumented pneumonia signs and symptoms (primary and secondary) in the assessment domain were treated as inappropriate care and therefore scored 0 in subsequent PAQC score construction. On the other hand, according to expert's belief, a possible explanation for missing oral amoxicillin prescription in the treatment domain could be that patients were prescribed other antibiotics other than oral amoxicillin.
Considering this possibility, we scored missing prescription with value 0 because it was contrary to WHO paediatric pneumonia recommendations [25].
In addition to imputing missing PAQC score subcomponents in the treatment domain, we also imputed partially observed covariates, that is, patients' sex at level 1 clinician's sex and cadre at level 2. Multiple imputation of both PAQC score subcomponents (treatment domain) and missing covariates led to slight changes in regression coefficients estimates and standard errors. This was in comparison to results from previous analysis of the trial [45] where we only imputed missing covariates and handled all missing PAQC score subcomponents using the conventional method. Although the proportion of missingness in PAQC score subcomponents of interest (treatment domain) was small, the observed difference is an indication of MI superiority in handling missing PAQC score subcomponents over the conventional approach.
Through a range of simulation conditions, we examined bias in regression coefficients and standard errors associated with missing data in PAQC score treatment domain components. Although increasing the missing data rate had a predictable effect on bias, that is larger bias with increasing proportion of missingness [19] we included this scenario to examine the extent of variation in bias across the missing data mechanisms and between missing data techniques adopted in the study.
Across the missing data rates (3%, 10% and 40%) and the missing data mechanism (MAR and MCAR) underlying PAQC score components in the treatment domain, MI of partially observed treatment subcomponents within PAQC score led to estimates close to the standard estimates hence smaller biases in both regression coefficients and standard errors.
In contrast, using the conventional method to handle all missing PAQC score elements (including treatment domain subcomponents) and imputing missing covariates only led to larger biases across variables of interest. Moreover, the bias in regression coefficients were more pronounced than bias observed under conventional method and more so when missing PAQC score subcomponents were generated assuming a MAR mechanism.
Simulation study results further showed that the magnitude and direction of bias varied across variables. The bias estimated under MAR mechanism was consistently larger in magnitude than bias observed under MCAR and these differences were observed under conventional approach.

Strengths and implications of the study
Through this study we have demonstrated superiority of MI over the conventional method in handling missing data in PAQC score subcomponents. To our knowledge our study is the first study to investigate underlying missingness mechanisms and to propose MI, as a strategy for dealing with partially observed PAQC score domain components. Furthermore, our study is the first to estimate bias in parameter estimates associated with missing PAQC score components through a simulation study based on routine paediatric data.
However, we note that even after MI of missing PAQC components in the treatment domain, we still observed some level of bias in the regression coefficients. In consideration of our controlled study inclusion criteria (i.e., inclusion of patients with pneumonia signs and symptoms), we restricted application of multiple imputation in handling missing treatment domain components. We note that for routine paediatrics studies without restricted inclusion criteria, MI can be extended to handle missing PAQC score components in the assessment and diagnosis domains of paediatric care.

Limitations
This study had several limitations. First, due to the complex nature of the outcome and the data structure we simulated missing data on a complete subset of the pneumonia trial data. This approach has been used in previously simulation studies [21,48]. Secondly, we did not estimate the true parameters from a complete subset of observed data. This is because most missing covariate data occurred in second level variables (clinician's sex and cadre) and approximately 83.2% case records had missing information. Exclusion of these incomplete records would have led to severe loss of information (i.e., reduced sample size) thus under powering the simulation study [16,41].

Conclusion
Using pneumonia PAQC score we have demonstrated that missing data in a composite outcome subcomponent should be addressed carefully. In comparison with conventional method, MI produce minimally biased estimates regardless of amount of missing data rate and underlying mechanism. However, more research is needed to compare different ways of performing multiple imputation at the component and composite outcome level.

Availability of data and materials
The datasets analysed in this study are not publicly available because they are a property of the Ministry of Health and we do not have authority to share on their behalf.      Figure 1 Bias in regression coefficients under the conventional approach of handling missing PAQC score components and after multiple imputation of missing PAQC score subcomponents in the treatment domain and missing covariates imputed across missing data rates and missing data mechanisms.

Figure 2
Bias in standard errors under the conventional approach of handling missing PAQC score components and after multiple imputation of missing PAQC score subcomponents in the treatment domain and missing covariates imputed across missing data rates and missing data mechanisms.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. Supplementary_file.docx