We used a data mining method to satisfy the sampling design requirements of the IQCAMP, a national pilot survey with a limited budget and sample size. The model-based clustering method divided districts into eight clusters; whereas, the hierarchical clustering method divided districts into two clusters. Before conducting the validity assessment through statistical analysis an expert group approved the face validity of the methods. The internal validity as measured by the within-cluster sum of square and Dunn index showed that the clusters of districts in MCM-8 had higher compactness and separation in comparison with HCM-2. Moreover, the majority of stability indices recognized that MCM with eight clusters is more stable than HCM with two clusters. Therefore, we selected MCM with eight clusters as the final model for sampling design. These clusters were mainly characterized by the probability of death from stroke, COPD, and CKD, in-hospital mortality rate, patient’s exchange rate, the mortality rate attributed to adverse events of medical treatment, and all-cause mortality ratio. The simulation results showed the MCM-8 improved sampling efficiency up to 1.7 times compared with the simple random sampling. However, sampling efficiency may vary depending on the most important distinguishing indicators of clusters’ features.
In the use of clustering methods, we built on earlier studies [22, 23]. Though there exists a large number of clustering methods in the literature, we used the MCM as it has several advantages over other clustering methods. It relies on statistical models and requires no pre-specified number of clusters [24]. HCM has also been extensively used in the literature [25, 26].
Relying on the contemporary theories of quality of healthcare [15, 16], an array of input indicators from demands, structures, and health outcomes were used by clustering methods to define optimum homogenous strata, which cover the spectrum of quality and cost in national studies as aimed by IQCAMP. The more homogeneous the strata, the more efficient the stratified sampling design [23]. This innovative way to define strata is an efficient alternative to conventional stratified sampling which defines strata based on, for instance, geographical units. This property is particularly desired in surveys with small sample size, which is potent to a larger variability of sampling results. Besides, by using this sampling method, we select participants only from eight provinces (rather than 31 provinces), which makes the sampling more feasible and affordable. Therefore, our sampling method benefits studies with small sample sizes by decreasing the level of sampling variability and the overall cost and feasibility of the study.
However, the efficiency of our final sampling method is measured by the quality and costs indicators of targeted health conditions. These indicators only relatively specify the aspects of quality and cost. Therefore, steps should be taken to include as much as inclusive, relevant, and precise prior information of quality and costs of health conditions for sampling.
Disease-specific surveys such as IQCAMP require large registries and health information systems that are barely available in developing countries. Usually, the information on the resource use (cost and utilization) and quality of services of different health conditions are limited to small samples collected by non-representative sampling methods such as convenient sampling [27–29]. Thus, the proposed clustering method is very appealing for developing countries where healthcare data are limited. This strategy helps policymakers to conduct small sample size surveys even with a limited budget.
The present study is subject to limitations. The first limitation regards the availability of district level data for some of the input measures, that were only available at provincial level. Thus, we used provincial level data for all districts of a province. The second limitation refers to representativeness of sampling results. The proposed method lies in the middle of a spectrum of sampling methods with convenient sampling methods at one extreme and simple random sampling at the other. Though the sampling method is far away from convenient sampling, an extent to which it comes closer to a representative sampling is unclear and needs to be evaluated in future studies. Worth to note that, the validity of the method depends on the appropriateness of the selected prior information on the quality and costs of the health condition. To maintain the validity, we selected HCQC indicators based on the accepted Donabedian SPO model [16] and the Design Science paradigm [30].
For the simplicity of sampling design, we used a common definition of strata for all eight health conditions in this research. This was motivated by the fact that access to prior information for each condition was limited. Furthermore, this common definition facilitated the administrative arrangement for data collection. However, with sufficient information per health condition, the definition of strata based on condition-specific outcomes could increase the sampling efficiency. We therefore call future research to address efficiency gain, cost, and feasibility of using condition-specific health outcomes to define strata for health conditions that are studied in the present research.