Synthesis of water, sanitation, and hygiene (WaSH) spatial pattern in rural India: an integrated interpretation of WaSH practices

Rural areas largely lack access to improved drinking water, sanitation, and hygiene (WaSH) facilities in India. This requires documentation of WaSH practices at the local level for better understanding and sustainable development. In this paper, a global positioning system (GPS)-based household survey was carried out in 67 villages of Phagi tehsil using individual questionnaires to evaluate the existing WaSH conditions spatially at the panchayat level. Three sub-indices were used for WaSH risk areas mapping and prediction with the integration of machine learning algorithms. Survey results indicate the improvement in the availability of toilet facilities; however, a gap was found between toilet ownership and its usage by villagers. Data show that only six panchayats have almost zero open defecation practices among the 32 panchayats of Phagi tehsil. The findings highlight that presence of toilets in house, water supply in toilets, and high literacy rate lead to an increase in toilet usage by the population. WaSH index scores indicate that panchayats like Mandawari, Mendwas, Chandma Kalan, and Rotwara have worst conditions and fall in the high-risk category. Moreover, support vector machine regression (SVMR) results reveal that WaSH scores are mainly affected by open defecation (r = 0.94), water supply in toilets (r = 0.92), and female members’ participation in sanitation facilities decision-making (r = 0.53), followed by literacy rate (r = 0.33). Findings demonstrate the association between gender inequalities and WaSH conditions, and the potential of the WaSH index as a monitoring tool by local policymakers to shrink the WaSH gaps.


Introduction
Accessibility to safe drinking water, adequate sanitation, and appropriate hygiene services are the necessities of people and essential to protecting them from infectious disease outbreaks. Latest estimates reveal that about 1.7 million deaths in the world are due to diarrhea which reflects the lack of safely managed drinking water, sanitation, and hygiene (WaSH) facilities (Gupta and Obani 2016). In India, 0.8 million deaths were reported due to diarrheal and intestinal infections mainly due to unsafe WaSH practices (GBD 2017). According to UNICEF and WHO (2019), about 2.2 billion people around the world do not have access to safe drinking water, 4.2 billion people lack proper sanitation facilities, 673 million people practice open defecation, and 3 billion do not have basic hand-washing facilities in the socio-economically challenged countries. Improved WaSH facilities are vital to prevent the transmission of diseases such as diarrhea, cholera, dysentery, hepatitis A, typhoid, and COVID-19, and this will help to create resilient communities. Hence, the international community designed sustainable development goals (SDG 6.1 and SDG 6.2) to provide safe and affordable WaSH facilities for the entire population and to end open defecation.
The practice of open defecation results in environmental (water) contamination that leads to waterborne diseases which are also responsible for high child mortality (Chan et al. 2021). India has achieved significant success in this area because of the Government of India's flagship programs such as the Swachh Bharat Mission (SBM), Total Sanitation Campaign, and the National Rural Drinking Water Programme (NRDWP) at district levels in each state. SBM has led to wide-scale construction of toilets to end open defecation across the country by providing financial support to below-poverty-line households, landless laborers, small and marginal farmers, women-headed households, and differently able people. However, 34 percent of Indian states face high water contamination levels, and about 718 districts face extreme water exhaustion (WaSH Tatatrusts 2020). It is estimated that 82 percent of rural households in India have no access to safe water (WHO 2017).
The situation is worse in rural areas of Rajasthan state, which is characterized by limited surface water bodies, scanty rainfall, and frequent droughts. Here, women/girls have to travel a minimum of 3-5 km daily to fetch water for their household needs. Furthermore, women face more health risks than men from unsanitary conditions and sometimes become the victims of violence while defecating in the open (World Bank 2011; Lee 2017). Gender inequality is also another critical issue in rural households of Rajasthan, as women are not decision-makers regarding sanitation and water facilities (Doron and Jeffrey 2014). These attitudes and practices in rural Rajasthan require a detailed household survey to monitor and evaluate the issues related to WaSH such as open defecation, toilet usage, drinking water quality, gender inequality, and the progress toward the Swachh Bharat Mission.
In areas with scarcity of surface water resources, inadequate sanitation, and hygiene services, geographical information system (GIS) can be an efficient and powerful tool to visualize the existing situation of WaSH and identify the risk areas (Maina 2015;De Moura and Procopiuck 2020). The detailed spatial patterns of drinking water, sanitation, and hygiene conditions at the local level could help to provide sustainable measures at the least cost. Several indexes were defined to evaluate the WaSH conditions (Cronk et al. 2015;Hashemi 2020;Dickin et al. 2021); however, local policymakers require a simple and easy-to-use index for quick assessment. Hence, in this research, a simple index was developed to monitor and understand the WaSH practices. Nowadays, with the development of computational intelligence techniques, several linear and nonlinear methods such as discriminant analysis, partial least squares, artificial neural networks, and support vector machines are available for categorization and regression (Noori et al. 2013;Leong et al. 2019). Kernel-based techniques like support vector machine regression (SVMR) offer more advantages over their ANN equivalents as they can model nonlinear systems well to produce more accurate results, can resolve small samples, and allow interpretation of the calibration models (Yoon et al. 2017;Haghiabi et al. 2018). Machine learning algorithms have the potential to detect patterns in the collected data and predict unknown variables (Froemelt et al. 2019;Shah et al. 2020). Hence, the issues impacting safe water availability, usage of toilets, defecation practices, and hygiene behavior prevailing in rural areas could be better evaluated using machine learning techniques. The objectives of the present study are as follows.
• Documentation and evaluation of the behavior of existing water, sanitation, and defecation practices in rural areas of Rajasthan using a global positioning system (GPS)based survey. • Classification and development of the WaSH index using water, sanitation, and hygiene indicators for the assessment of the current WaSH practices of the rural population. • Spatial risk mapping to ascertain the WaSH practices of rural people at the panchayat level. • WaSH index prediction by integrating machine learning algorithms to understand the attitude of the villagers.

Material and methods
The study area for this work is the Phagi tehsil ( Fig. 1 Figure 1b,c shows the mean population density, mean household size, and mean literacy rate of Phagi tehsil at the panchayat level. The high population density was observed in Chandma Kalan (383.7 per sq km), Renwal (351.9 per sq km), and Pahadia (316.9 per sq km) panchayat; however, the highest household size was recorded in Mandawari (7.7) and Harsulia (7.2) panchayats. The total literacy rate is higher in the Phagi panchayat (69.02%) and lowest in Chandma Kalan (54.53%). The male literacy rate is high in Phagi (82.73%), followed by Gohindi (82.23%), Renwal (81.58%), and Mohabbatpura (81%), and lowest in Kishorepura (69.4%). However, the female literacy rate is highest in Renwal (55%) and lowest in Mandor panchayat (36.9%).

Data collection and analysis
The household survey data collection took place from December 2019 to March 2020. Since the study area is vast, so random sampling technique was used for the selection of the representative number of respondents from 67 villages in the Phagi tehsil. A total of 319 respondents were taken for the proposed study from 32 panchayats of Phagi tehsil. Basic information collected about the respondents for this study was age, gender, number of family members, education level, economic status, and female participation in sanitation facilities decision-making. The information related to WaSH indicators such as the source of water, drinking water quality, toilet facility availability at home, toilet facility usage, water supply in the toilet, toilet cleanliness, ventilation, and soap hand-wash practices were also collected to assess the existing WaSH conditions and defecation practices in different villages of Phagi tehsil. Actual toilet functionality, ventilation, and cleanliness were physically observed during the survey. Groundwater level and groundwater quality data like total dissolved solids (TDS), fluoride, total hardness (TH), and chloride were collected from State Ground Water Department (SGWD), Jaipur, for the year 2019. The spatial analyst tool of ArcGIS software (ESRI 2011) was used to interpolate the groundwater level and groundwater quality data using the ordinary kriging technique, followed by zonal statistics calculation for 32 panchayats of the study area. Thematic layers such as the tehsil boundary and panchayat boundary were also generated in the ArcGIS environment. The detailed methodological design of the research is described in Fig. 2.

WaSH index estimation and risk mapping
The performance of water, sanitation, and hygiene indices are sensitive to the set of indicators used for their calculation. Hence, the WaSH indicators were classified into three sub-indices as water sub-index, sanitation sub-index, and hygiene sub-index for WaSH index estimation and evaluation of risk areas. Different categories of qualitative and quantitative data (Table 1) were standardized using the grade-weighted method (Yu et al. 2019) based on expert opinion and the actual characteristics. For this purpose, every individual indicator is categorized into different classes and assigned weights ranging from 0 (worst) to 1 (best), with 0.2 intervals, for scaling the data and assessment of sub-indices (Tsesmelis et al. 2020). This is followed by (1). This process was also adopted for the estimation of the WaSH Index, as it incorporated all three sub-indices mentioned above.
The WaSH index scores were further categorized into four risk categories ranging from 'no risk' to 'high risk' and mapped at the panchayat level.

WaSH index analysis using support vector machine regression
In this research, the WaSH index was analyzed using SVMR. The basic principle of the SVMR is the mapping of the inputs either linearly or non-linearly into a probable higher aspect of feature space. It constructs a classifier using available samples and avoids misclassification in future predictions (Kurniawan et al. 2021). The SVMRs effectiveness for classification and regression completely depends on the function type of kernel (linear, polynomial, and radial), ∈ -insensitive loss function, and capacity parameter C (Singh et al. 2011). The SVMR comprises the structural risk minimization (SRM) principle, which is better than the empirical risk minimization (ERM) principle used in neural networks conventionally (Talesh et al. 2019). For SVMR, the dataset is distributed into three sets for training, validation, and testing. The training data was further used to prepare the model automatically.
The model network was designed with eight parameters (Table 2) as input parameters (age, literacy, level of education, economic status, participation of females in sanitation facilities decision-making, open defecation practice, water supply in the toilet, and water level) and the WaSH index as the output parameter. Three different models of SVMR such as partial least squares (PLS), standard support vector regression (S-SVR), and least squares support vector regression (LS-SVR) were used in this research for the prediction of the WaSH index and to understand the variable relation with WaSH index. PLS algorithm develops the original X space onto a new one and determines the linear correlation among the new variables and the Y values. The latent variable plays a significant role in the parameters of the algorithm, which bears the covariance structures between the new X space and the Y values. After the observation of the data samples from each block variable, PLS changes the matrix of zero-mean variables X (n × N) and the (n × M) matrix of zero-mean variables Y into the form   (2) In this equation (Eq. (5)), the weight of the vector and the bias term has been represented by w and b, respectively. SVMR models were applied with RBF kernel function to predict the target variable, WaSH index using a set of independent variables. The performance of the SVMR models was measured by computing three validation statistical indexes, i.e., coefficient of determination (R 2 ), coefficient of correlation (CC), and root mean square error (RMSE).

Evaluation of water, sanitation, and defecation practices
From 67 villages of Phagi tehsil, a total of 319 questionnaire-based surveys were conducted with one individual from each household. Taking into account that in rural areas women are suffering more from health-related issues due to unsafe sanitation and hygiene practices, women were the preferred interviewees in the study. The number of female respondents was 195 (61.1%), and the study had 124 (38.9%) male respondents (Table 3). Age distribution indicates more participation of the 21-40 age group individual in the survey. Literacy data reveals that male respondents have higher literacy (68.6%) than female respondents (47.7%). Education level data shows a high number of male respondents with  primary 31% and secondary 38% education in comparison to female respondents, i.e., 29% primary and 19% secondary education, respectively. The survey results (Table 4) show that the main source of water in the Phagi tehsil is piped and groundwater (60%). The government water supply in most of the villages is through public taps located at varying intervals on the street. However, the villagers have to depend on groundwater also because of irregular (supplied every 2-3 days, not daily) government water supply. In the absence of borewell and handpump water sources, 6% of respondents are using tanker water (water supply through private water tankers). Overall, groundwater is the main source of water in the study area. About 57.7% of respondents are satisfied with water quality; however, 42.3% of respondents stated that water is hard and not suitable for drinking. The groundwater quality data analysis reveals that the TDS level is more than 1000 mg/L in all the panchayats of the study area, which reflects the unsuitability of water for drinking purposes (Adimalla and Wu 2019). Total hardness (TH) as calcium carbonate (CaCO 3 ) ranged from 178.3 to 567.3 mg/L, with a mean of 301.8 mg/L. As per Sawyer and McCarty's (1967) classification, groundwater is considered as soft with TH < 75 mg/L as CaCO 3 , moderate hard with 75-150 mg/L, 150-300 mg/L as hard, and > 300 mg/L as very hard.
Results indicate that 19 panchayats have hard groundwater, while 13 panchayats fall in the category of very hard groundwater with 567.31 mg/L as CaCO 3 in Lasadia, 554.42 mg/L as CaCO 3 in Nimeda, 404.2 mg/L as CaCO 3 in Mohabbatpura, and 385.57 mg/L as CaCO3 in Mendwas. High hardness in groundwater may be due to carbonate sources (Koffi et al. 2017). The concentration of chloride ranged from 371 to 1409 mg/L, while the maximum allowable limit for chloride is 600 mg/L (WHO 2017). Very high concentration of chloride was observed in all the panchayats except Kishorepura (445.65 mg/L), Chittora (409.8 mg/L), Mohanpura (371.03 mg/L), Renwal (539.05 mg/L), Phagi (572.77 mg/L), Chakwara (525 mg/L), and Choru (594.77 mg/L). The excess chloride content in groundwater is considered an index of pollution and is known to have adverse impacts on human health (Li et al. 2018). In the study area, a very high concentration of fluoride (1.5 to 4.4 mg/L) was also observed in all the panchayats. Exposure to high fluoride content in drinking water usually results in dental and skeletal fluorosis (Sharma et al. 2015).
The present study results highlight the gap between toilet ownership and its usage by household members. About 75.5% of households interviewed during the survey had toilet facilities in the house due to the government flagship  programs; however, only 62.4% of those households actually use them, and the remaining converted these toilets into the storage area. Other studies (Coffey et al. 2014;Barnard et al. 2013;Lee 2017) have also shown that latrine ownership has increased, but more than a third of those were not being used by the households. It is found that out of 32 panchayats, only three panchayats, namely Mandawari (19%), Mendwas (42%), and Rotwara (33%), have less than 50% toilet ownership (Fig. 3a), which indicates the improvement in sanitation practices due to government efforts. Toilet usage is very less in Chandma Kalan (36%), Mendwas (17%), Mandawari (19%), Rotwara (33%), and Gohindi (38%) panchayats (Fig. 3b). However, 56.7% of respondents stated that the toilet facility is used by all family members. The reasons for not using toilets were water scarcity (27.5%), tradition (47.5%) especially among the elders of the village, bad odor (15%), and costly maintenance (10%). The results reveal that only six panchayats out of 32 panchayats in Phagi tehsil have almost zero open defecation practices and 12 panchayats fall in more than 40% open defecation category. Predominantly open areas such as barren land and agricultural fields outside the village were used for defecation (Geetha and Srinivasan 2014). Respondents considered open defecation as a social outing, and it eliminates the need to maintain the toilet. During the survey, it was observed that open defecation practice is least in the areas with less open areas and fenced agricultural fields. Since a large number of the rural populations in India are still defecating in open areas; hence, proper awareness is required to alleviate the problems associated with open defecation (Anuradha et al. 2017).
It is clear from Fig. 3b that water supply in toilets is affecting the toilet usage in households as both are following the almost same trend. The survey results demonstrate that nine panchayats have less than 50% literacy and the literacy rate is high in males compared to females. The high literacy level of women helps to improve children's health issues and gain access to safe WaSH facilities (Bisung and Dickin 2019). However, WaSH condition is worsening in Phagi tehsil as most of the panchayats have less than 30% female participation in sanitation facilities decision-making (Fig. 3c). It can be understood from this survey that the existence of toilets, water supply in toilets, and high literacy rate could lead to an increase in toilet usage. Women in rural areas of India are usually less involved with decisions on spending for water and sanitation facilities compared to men (Routray et al. 2017). Therefore, household decision-making has a great influence on the outcomes of WASH interventions (Dery et al. 2020).
Several studies have highlighted the importance of handwashing in the reduction of fecal-oral disease transmission paths (Cairncross 2003;Fewtrell et al. 2005;Herbst et al. 2009). The hygiene conditions have a huge impact on health; hence, soap hand-wash data was collected through the survey to understand the scenario in the villages of Phagi tehsil. It is found that age plays a significant role in soap usage for hand-washing after defecation and before meals. The results indicate that respondents below 20 years washed hands with soap after defecation (69.2%), whereas people above 60 years are least to use soap for hand-washing (32.6%). People in the surveyed villages have the poor habit of not washing hands before meals (Table 5). It was found that only 59.2% of respondents in the age group of 15-40 years washed their hands before meals. The survey results highlight that hand-washing with soap after defecation and before meals is common among those less than 40 years aged people due to education and awareness about good hygiene practices (Banda et al. 2007).

WaSH risk areas
The sub-index calculation and risk areas categorization of panchayats based on the WaSH index were evaluated for further improvement in the study area. Water sub-index results reveal that only Renwal and Mohanpura panchayat was under the good category with 0.81 scores based on primary and secondary data; however, twelve panchayats scored 0.6 to 0.8 (Fig. 4a). For the sanitation sub-index calculation, six indicators were used to assess the existing sanitation practices in the study area using the household survey data. The results reveal that sanitation condition is worst in the Mandawari panchayat which scored only 0.18 (Fig. 4b), followed by Mendwas (0.20), Rotwara (0.28), Chandma Kalan (0.33), and Gohindi (0.36). In Madawari panchayat, out of sixteen respondents, only three had the toilet facility at home and used it. The low scores show that open defecation practices are still prevalent in the panchayats.
Eight panchayats of the study area viz. Nimeda, Madhorajpura, Phagi, Dabich Chittora, and Renwal scored more than 0.8, indicating very good sanitation practices adopted by villagers with no or very less open defecation. The hygiene sub-index was assessed based on four indicators, and results reveal that in Chandma Kalan panchayat, the hand-washing using soap after defecation (score 0.18) and before the meal (score 0.09) is not much practiced by villagers. Out of thirtytwo panchayats, only five panchayats such as Phagi, Renwal, Dabich, Mohabbatpura, and Mohanpura scored 0.6 to 0.8 (Fig. 4c), reflecting the awareness among villagers related to cleanliness and hygiene.
The aggregation of three sub-indices was used for the estimation of the WaSH index for all the 32 panchayats of the Phagi tehsil (Fig. 4d). The results indicate that four panchayats viz. Mandawari, Mendwas, Chandma Kalan, and Rotwara have the worst conditions with WaSH scores of 0.34, 0.35, 0.39, and 0.38, respectively. However, four panchayats viz. Renwal, Phagi, Dabich, and Mohanpura show no risk conditions with WaSH scores higher than 0.8.

WaSH index prediction using SVMR models
The household survey findings show that WaSH conditions in rural areas are determined by different factors like gender inequality, education, and water supply in toilets. In this research, machine learning technique such as SVMR was used to find the correlation and for predicting the WaSH index. In SVMR, the WaSH index was used as the dependent variable, whereas the eight variables (age, literacy, level of education, economic status, participation of females in sanitation facilities decision-making, open defecation practice, water supply in the toilet, and groundwater level) constituted the set of independent variables. Three different models of SVMR, PLS, S-SVR, and LS-SVR were used to predict the WaSH index and to understand the variables associated with the WaSH index. The data were divided into the learning process (70% of the dataset) and the testing process (the remaining 30%). The data for both learning and testing were selected randomly from a total of 32 panchayats to avoid estimation biases. RBF kernel is used to provide good results under the assumption of general smoothness (Wang et al. 2018;Yahya et al. 2019) among the kernels such as linear, sigmoid, polynomial, and RBF. The tenfold crossvalidation process was repeated twenty times to derive the SVMR model parameters. The actual and expected values for the WaSH index were calculated using different models, as shown in Fig. 5. The appropriate model was identified based on minimum values of RMSE, high R 2 , and CC for assessment of the WaSH index. Table 6 displays a comparative analysis of the predictive performances of the SVMR models.
LS-SVR is capable of producing the most accurate estimates of the WaSH index with the coefficient of determination (0.902 and 0.877) and root means square error (0.041 and 0.05) in the training and testing stages, respectively. It can be seen that both models (PLS and LS-SVR) are capable of predicting the WaSH index accurately, but LS-SVR outperformed PLS. Figure 6 shows the actual and predicted values of the WaSH index at the panchayat level. It is found from the SVMR analysis that WaSH index shows positive correlation with open defecation (r = 0.94), water supply in toilet (r = 0.92), participation of females in sanitation facilities decision-making (r = 0.53), followed by literacy rate (r = 0.33) and economic status (r = 0.27). The findings of this study emphasize the importance of water supply in toilets, literacy level, and participation of females in decision-making to WaSH Index scores (Hirai et al. 2016;WaterAid 2017).

Conclusion
This study assessed and quantified the domestic WaSH conditions in the rural areas of Rajasthan state in India at the panchayat level using GPS-based household survey data and In order to understand and evaluate the practices with regard to existing water, sanitation, and defecation practices, a suitable index is developed for the spatial assessment of WaSH conditions. The WaSH risk areas were also identified for further improvement and ease of management by planners  at the local level to reduce the gaps in toilet ownership and usage. The integration of GIS and soft computing methods permitted a more in-depth examination of WaSH and behavioral determinants. Three different models of SVMR viz. PLS, S-SVR, and LS-SVR were used to predict the WaSH index and to understand the variables associated with the WaSH index. SVMR results reveal a strong correlation of the WaSH index with open defecation and water supply in toilets. The survey data also elucidate that most of the respondents consider open field defecation very economical as this eliminates the need to maintain toilets and because of the scarcity of water. Therefore, education and awareness campaigns on health and hygiene are essential to improve the WaSH condition in rural areas. Fig. 6 Comparison of actual versus predicted behavior of WaSH index at the panchayat level using the LS-SVR model