Evaluation of rock mass permeability along a borehole by integrating the techniques of geological features and logistic regression: a case study in Taiwan

When the budget for in situ hydraulic tests is constrained, cost-effective approaches for determining the variations in hydraulic conductivity along a borehole remain enticing for helping design and planning of groundwater-related engineering systems. This study proposes a practical method with probabilistic outputs for fulfilling engineering concerns. First, 474 sets of hydrogeological investigation and hydraulic test data of fractured rock masses in most of the watersheds of mountainous areas of Taiwan were collected. Then, seven geological indices [rock quality design (RQD), depth index (DI), gouge content design (GCD), lithology permeability index (LPI), fracture density (FD), fracture width (FW), and groundwater velocity (GV)] significantly correlated with rock mass permeability were identified. Third, using logistic regression analysis, the seven indices were used as explanatory variables, and the hydraulic conductivity was utilized as an outcome variable (its threshold value is 1 × 10−6 m/s) for developing prediction models of high groundwater potential zones along a borehole. All indices passed the collinearity test, indicating no collinearity between the indices. To make the prediction models proposed more flexible in practical applications, a total of 127 combinations based on the combination selection of seven explanatory variables were explored to develop various prediction models. Through the validation of the Hosmer–Lemeshow test, Omnibus test, and Wald test for all developed models, only 77 models were statistically significant. The 77 models were evaluated by three measures of Nagelkerke R2, SR (success rate), and AUC (area under the curve) to further understand the prediction performance of each model. The results show that the accuracy of the prediction model is positively correlated with the number of geological indices used. When all seven geological indices are used, Nagelkerke R2, SR, and AUC can reach the best values, which are 0.83, 91.6%, and 0.976, respectively. In conclusion, the prediction model developed by combining geological indices with logistic regression analysis can provide a more efficient way of understanding the hydrogeological conditions of a borehole site.


Introduction
Investigating hydraulic conductivity data of boreholes is a prerequisite to characterizing the hydraulic properties of a site in various groundwater-related engineering practices (Marliva 2016). Engineers are continuously looking for approaches that can serve as cost-effective options for determining the in situ values of hydraulic conductivity.
The hydraulic conductivity data along a borehole with a detailed and continuous type is more valuable than those with the discrete or single-value type for input data in developing hydrogeological groundwater models or identifying the hydraulic heterogeneity of fractured rock aquifers (Hsu 2021). However, the budget for most of the general projects was limited to planning a detailed survey of the hydraulic properties of boreholes. Therefore, a question regarding how to identify detailed rock mass hydraulic conductivity along a borehole with high efficiency and low cost is still required to be solved.
In practice, double packer hydraulic tests with a fixedinterval technique have been developed for dealing with cases involving complicated hydraulic properties of rock masses (Sara 2003). Although investigated results from the technique are reliable, reconnaissance associated with the technique is fairly expensive and labor-consuming. To obtain detailed hydraulic conductivity data while the project budget is under constraint, hydrogeologists have attempted to solve this problem using various methods, including empirical equations, geophysical techniques, analytical solutions, and numerical modeling. Shahbazi et al. (2020) and Hsu (2021) performed excellent reviews of the four methods on disadvantages and advantages. The primary practical drawbacks of these methods are associated with simplified assumptions, limited field data, deterministic concepts, and indirect measurements. Among these methods, Hsu's (2021) study confirmed that empirical equations, if the statistical data are well collected, have the competence to simultaneously favor diverse engineering needs, including direct measurements from rock core samples, workforce, completion time, and project budget. Additionally, Hsu's empirical equations (2021) that are applicable to diverse geological environments are superior to early empirical equations that can only be applied to a single geological environment. As a result, these developed empirical equations extend the breadth of practical applications.
Although these empirical formulas can produce continuous hydraulic conductivity data faster and less costly, the accuracy of the existing formulas depends on the value of the coefficient of determination (R 2 ). If the R-squared value tends to be low, the estimated results may lead to poor predictions. In addition, the hydraulic conductivity estimated by these empirical equations is a deterministic (real number) output rather than a probabilistic output. The real number does not consider the uncertainty of this predicted parameter. However, in some practical scenarios, it is more appropriate to use an approach for the occurrence of an event. Forecasting a weather condition, whether or not tomorrow is a rainy day or sunny day with a probability output, is a well-known example. The logistic regression can be used to perform binary classifications and to get the probabilistic outcome.
Since 1976, the logistic regression analysis has been used in a wide variety of disciplines (Boateng and Abaye 2019;Lian 2018;Hasan 2019;Hosmer et al. 2001;Woolley et al. 2012). Applications related to environmental engineering are commonly seen in landslide potential assessment, groundwater contamination risk analysis, and earthquake hazard assessment (Anthony et al. 2019;Rizeei et al. 2018;Oh et al. 2018;Sujatha and Sridhar 2021;Santos-Reyes 2020;Xie et al. 2018). As for the application of logistic regression on the estimation of hydraulic conductivity, little research has been conducted. After analyzing the possible reason, it is found that the logistic regression analysis relies on having sufficient data samples derived from hydraulic tests and borehole logging. In the case of insufficient datasets, the logistic regression method is rarely applied. Once the datasets required for analysis are available, studies on the estimation of hydraulic conductivity performed by the logistic regression method become possible.
In Taiwan, such data mentioned above have been collected in a long-term project titled "Groundwater Resources Investigation Program for Mountainous Region of Taiwan" since 2010 until now (Central Geological Survey of Taiwan 2010). This project aims to explore potential groundwater yields in mountainous areas of Taiwan by collecting and investigating hydrogeologic data in regolith-bedrock aquifers. The appearance of the massive hydrogeologic data gives an excellent opportunity to conduct this type of research. Therefore, in this study, the logistic regression method integrated with the technique of geological indices was proposed to develop a cost-effective method for helping predictions of rock mass hydraulic conductivity along a borehole. First, the existing hydrogeological data, including rock core data, double packer hydraulic test data, borehole televiewer image data, and groundwater flow velocity measurement data, were collected. Based on the collected data, seven possible geological factors that affect the hydraulic properties of fractured bedrocks and their rating rules for these factors were identified. Second, collinearity analysis was utilized to evaluate the occurrence of high intercorrelations among seven variables. Third, the logistic regression technique was employed to develop possible models for predicting the probability of hydraulic conductivity exceeding a specified threshold by using explanatory variables from various combinations of seven factors. Finally, the performances of various logistic regression models were assessed and compared.

Study area and data sources
The study area covers 27 major river basins in Taiwan, including Zhuoshui River Basin, Dajia River Basin, Wuxi River Basin, Hualien River Basin, and Liwu River Basin in the mountainous area of Central Taiwan; Donggang River Basin, Linbian River Basin, Nanpingtung River Basin, South Taitung River Basin, Gaoping River Basin, Zengwen River Basin, Puzixi Basin, Bazhang River Basin, Xiuguluan River Basin, Beinan River Basin, Fengbin Coastal River Basin, and the river basin on the east side of the coastal mountain range in the mountainous area of Southern Taiwan; Danshui River Basin, the coastal river basin of the north coast, the coastal river basin of Taoyuan, Fengshanxi River Basin, Touchien River Basin, Zhonggang River Basin, Emei River Basin, Houlong River Basin, Nanhu River Basin and Xihu River Basin in the mountainous area of Northern Taiwan. Various hydrogeological investigations, including borehole drilling, borehole televiewer, electrical well logging, groundwater velocity measurements, and double packer hydraulic tests, were completed at each of the 111 sites ( Fig. 1) in these river basin areas.
To establish a model for estimating hydraulic properties of fractured rocks, this study collected samples from the above-mentioned large amount of hydrogeological survey data. The selections of data samples were based on double packer hydraulic test sections (test interval of 1.5 m) and lithology belonging to consolidated rocks. And then, this study collected all kinds of geological feature data under the selected samples, including drilling core data, borehole image data, and groundwater flow velocity data. A total of 474 double packer hydraulic test samples have been collected. The lithological environment covered by the data samples includes sandstone, shale, sandy shale, sandstone interbedded with shale, mudstone, siltstone, silty sandstone, argillaceous sandstone, argillaceous siltstone, siltstone interbedded with shale, sandstone interbedded with mudstone, argillaceous sandstone interbedded with shale, clay, quartzite, argillite, phyllite, marble, slate, andesite, and schist. Table 1 shows the number of samples collected for each sub-lithology. The top three lithologies in the analyzed samples are sandstone, sandstone interbedded with shale, and slate. The above data collected can construct possible influencing factors to hydraulic conductivity, followed by logistic regression analysis techniques to establish reliable models for predicting rock mass permeability. As expected, the developed models are more practical due to the wide range of geological environments covered by the data sources collected.

Methods
To improve the limitations of existing methods for estimating rock mass permeability, this study intends to develop an estimation model with the concept of probability. The newly developed model mainly uses logistic regression analysis, combined with the geological characteristics that may affect the permeability of the rock mass, for model development.
In addition, the model is designed to identify whether the hydraulic property of the engineering site meets the development conditions of high groundwater potential. The following subsections describe the methodology for developing the estimation model of this research as follows.

Logistic regression
The logistic regression allows exploring the chance of an event based on the values of the controlling variables. In this study, the outcome variable is associated with whether the hydraulic conductivity in a given double packer test interval reaches the threshold of high groundwater potential. According to Struckmeire and Margat's (1995) study for groundwater resources explorations, a classification relation between hydraulic conductivity and the potential of water supply was reported. This classification indicated the potential of water supply could be attributed to the regional supply when the hydraulic conductivity is greater than 4 × 10 −5 m/s; the potential of water supply can be attributed to the local supply when the hydraulic conductivity is between 2 × 10 −6 and 4 × 10 −5 m/s; the potential of water supply can be attributed to the partly local supply when the hydraulic conductivity is between 2 × 10 −8 and 2 × 10 −6 m/s; the potential of water supply can be attributed to the lack of groundwater resources when the hydraulic conductivity is less than 2 × 10 −8 m/s. Based on this classification and the goal of local water supply, this study used the threshold of 1 × 10 −6 m/s for identifying the high groundwater potential. Therefore, the outcome variable was 0 when the hydraulic conductivity was less than the threshold; the outcome variable was 1 when the hydraulic conductivity was greater than the threshold.
The relationship between the probability of high groundwater potential and the controlling variables as the logistic regression model can be expressed below.
where P stands for the estimated probability of high groundwater potential occurrence. z can be defined as follows: (1) where z is the weighted linear combination of the controlling variables; a 0 is the intercept of the logistic regression model; a i (i = 1, 2, 3, …, n) are the coefficients of controlling variables; n is the number of controlling variables; x i (i = 1, 2, 3, …, n) are the controlling variables of high groundwater potential. In addition, the coefficients in the logistic regression model are estimated from field sample data with the maximum likelihood estimation approach.

Selection of potential controlling variables
While constructing a logistic regression model, a principal issue is to select which controlling variables or predictors to include in the model. The logistic regression analysis should select potential variables or predictors that may control rock mass permeability. These predictors can be found from geological features related to hydraulic properties. Potential geological features as the geological indices can be determined from in situ hydrogeological drilling and tests. Many previous studies have derived relations between geological features and rock mass permeability (Adedokun and Abubakar 2016;El-Naqa 2001;Hamm et al. 2007;Hsu 2021;Lee and Farmer 1993;Shahbazi et al. 2020). Based on the available onsite data from this research and recommendations from the studies mentioned above, a total of seven potential predictors were considered in the logistic regression analysis, and rating values for these factors are summarized in Table 2. Seven predictors include RQD (Rock Quality Designation), DI (Depth Index), GCD (Gouge Content Designation), LPI (Lithology Permeability Index), FW (Fracture Width), FD (Fracture Density), and GV (Groundwater Velocity). In addition, collinearity among the selected predictors may affect the performance of a logistic regression model (Hosmer et al. 2001). The variance inflation factor (VIF) and condition index (CI) were utilized to test which predictors are affected by collinearity and the correlation strength. The statistical software SPSS was used to compute two diagnosis values for each predictor. VIFs greater than ten and CIs greater than 30 may represent critical levels of collinearity The RQD predictor represents the degree of rock fracturing that can be used to evaluate rock mass permeability. The value of RQD is between 0 and 100% R S : the accumulative length of core pieces greater than 100 mm R T : the total length of the core DI The DI predictor is used to evaluate the geostatistic effect on the permeability of rock masses. The value of DI is between 0 and 1 L T : the total length of the borehole Lc: a depth situated in the middle of a double packer test interval in the borehole The GCD predictor is an index that is used to evaluate the permeability of clay-rich gouges in the infillings of fractures R G : the total length of gouge content LPI The detailed rating guidelines for LPI can be referred to as a rating table for various lithologies provided by Hsu et al. (2019) The LPI predictor is utilized to rate the hydraulic properties of the lithology of the rock matrix FD The FD predictor is used to compute the fracture density per 10 cm in a hydraulic test interval FN : the number of fractures in a specified hydraulic test interval The FW predictor is utilized as a permeability index, and its numerical value can be measured using borehole images scanned with either optical or acoustic borehole televiewers F j : the width of each fracture in a specified hydraulic test interval n: the number of fractures in a specified hydraulic test interval The GV predictor is the difference between V down and V up that is utilized to determine the hydraulic property for a specified zone of a rock mass. The velocity data can be measured using a heatpulse flowmeter. If the measured velocity data is not close to a test interval's upper and lower boundary, a suitable modification for velocity data is needed. The detailed rating rule for modified GV can be referred to Hsu (2021) V down : the measured velocity data points at the lower boundary of a fixed hydraulic test interval V up : the measured velocity data points at the upper boundary of a fixed hydraulic test interval (Chen 2009). Thus, dependent variables with the collinearity problem need to be removed from the developed logistic regression model.

Establishment and evaluation of logistic regression models
In the logistic regression analysis, the selection of the controlling variables can be considered to perform all combinations from one to more than one predictor. From the selected seven predictors mentioned in "Selection of potential controlling variables" section a total of 127 combinations were tested to develop various logistic regression models. The analysis of the various combinations is helpful in understanding which types of predictor combinations lead to a better estimation model. The other benefit is that some of the developed models from fewer predictors can be utilized when seven predictors are not accessible from in situ investigations.
In addition, the forward selection procedure for specifying how explanatory variables are entered into the logistic regression analysis was utilized. This procedure is one of the stepwise selection methods, with entry testing relying on the significance of the score statistic and removal testing relying on the probability of a likelihood ratio statistic. The analysis strategy automatically eliminates explanatory variables that do not contribute to the model so that the final explanatory variables used in the model can be significant and finally produce a statistically more explanatory model.
Finally, performances for these fitted logistic regression models from various combinations are needed to be evaluated in several ways. First, the significance of each of the controlling variables is evaluated by performing the Wald statistic. Second, the overall goodness of a logistic regression model is carried out by performing the likelihood ratio test (Omnibus test) and Hosmer-Lemeshow (H-L) test, respectively (Boateng and Abaye 2019). The likelihood ratio test with a p-value less than 0.05 indicates that at least one of the controlling variables contributes to the prediction outcome. H-L test with P < 0.05 points out a poor fit to the data. Third, the pseudo R 2 (Cox&Snell and Nagelkerke R 2 ) measure is used to determine how useful the controlling variables are in predicting the outcome variable (Tjur 2009). The greater the pseudo R 2 value, the better the fit of the logistic regression model. Fourth, the classification table, known as the error matrix, is utilized to assess the predictive accuracy of a fitted model. Table 1 shows the sample classification table. This study selected a cut-off value of 0.5. Thus, all predicted values greater than 0.5 can be classified as predicting high groundwater potential (event), and all predicted values less than 1.5 as not predicting this event. The predicted and observed values for the dependent outcome are cross-classified, as shown in Table 3. If a logistic regression model has high accuracy, many counts in the a and d cells are expected. The success rate (SR) can be computed by (A + D)/(A + B + C + D). Fifth, this study carried out the discrimination of a logistic regression model with a Receiver Operating Characteristic (ROC) curve, which is provided by plotting the pairs of one minus specificity and sensitivity on a scatter plot (Bewick et al. 2004). The ROC curve is assessed by computing the area under the curve (AUC), which usually varies between 0.5 and 1.0. The ACU value of 1.0 indicates that the fitted model has the perfect predictive ability, whereas an ACU value of 0.5 shows no predictive ability. The greater the AUC value, the better the predictive model. This study used Swets' (1988) standard to evaluate the discrimination ability of the model. The AUC value between 0.9 and 1.0 represents excellent discrimination; A value between 0.8 and 0.9 stands for very good discrimination; A value between 0.7 and 0.8 represents good discrimination; A value between 0.6 and 0.7 represents average discrimination; A value between 0.5 and 0.6 stands for fair discrimination, and a value equal to or less than 0.5 represent poor discrimination.

Statistical analysis for the outcome and controlling variables
In logistic regression analysis, the dependent variable is hydraulic conductivity (K) obtained from double packer tests, and the selected independent variables are composed of RQD, DI, GCD, LPI, FW, FD, and GV. First, descriptive statistics, including the number of data set, mean, minimum, maximum, and variance, were utilized to describe the basic features of the outcome and controlling variables, as listed in Table 4. The statistics show that the range of values for each parameter is extensive.
For the outcome variable K, K ranges between 5.40 × 10 -11 and 3.66 × 10 -4 m/s, and the mean value of K is 9.26 × 10 -6 m/s. Under the design of K equal to 1.00 × 10 -6 m/s as the threshold value of the high groundwater potential zone, among 474 data samples, the proportion greater than the threshold accounted for about 25 percent. For seven controlling variables, in addition to the basic statistics, each variable was classified into different classes to understand the proportion of each classification that occurs greater than the threshold value. Results can confirm the validity of the variables chosen for this study. Figure 2 shows the data quantity and proportion of cases above/below the threshold by the classification of each controlling variable. Figure 2a shows that the proportion of cases above the threshold K value increases with the decrease of RQD. The higher the degree of fracture fragmentation of the rock mass (the smaller the RQD), the higher the probability of occurrence of high groundwater potential sections along a borehole. Figure 2b shows that the proportion of cases above the threshold K value increases with the increase of DI. This outcome indicates DI has a positive relationship with hydraulic conductivity. The shallower the depth of the stratum, the higher the probability of occurrence of high groundwater potential sections along a borehole. Figure 2c shows that the proportion of cases above the threshold K value decreases with GCD. GCD is designed as an index to represent the gouge content in the fractures of the rock mass. The higher the content of gouge, the lower the permeability of rock mass. Thus, The larger the GCD value, the smaller the probability of occurrence of high groundwater potential sections along a borehole. Figure 2d shows the proportion of cases above the threshold K value, as a whole, increases with the increase of LPI. LPI is the relationship between rock mass lithology and rock mass hydraulic conductivity. The larger the LPI value, the better the rock mass permeability. Thus, The larger the LPI value, the greater the probability of occurrence of high groundwater potential sections along a borehole. Figure 2e shows that the proportion of cases above the threshold K value increases with the increase of FD. FD is the number of fractures in the double packer testing interval without shear mud, and this index has a positive correlation with the hydraulic conductivity. Thus, The larger the FD value, the greater the probability of occurrence of high groundwater potential sections along a borehole. Figure 2f shows that the proportion of cases above the threshold K value increases with FW. FW is the sum of the aperture in a double packer test section without shear mud, and its index is positively correlated with the hydraulic conductivity of rock mass. Thus, The larger the FW value, the greater the probability of occurrence of high groundwater potential sections along a borehole. Figures 2g shows the proportion of cases above the threshold K value, as a whole, increases with the increase of GV. GV is the vertical groundwater flow rate at different depths of the borehole, which positively correlates with the hydraulic conductivity. Thus, The larger the GV value, the greater the probability of occurrence of high groundwater potential sections along a borehole.
The above analysis results confirm that the seven factors selected in this study are all correlated with hydraulic conductivity, which can be used to identify the probability of occurrence of high groundwater potential sections along a borehole. Nevertheless, if the logistic regression model contains multiple controlling factors, the factors need to be tested for collinearity to confirm the factors with collinearity problems. Two diagnostic indicators, VIF and CI, were used to measure collinearity in the chosen controlling variables. Table 5 shows the outcome of collinearity analysis for seven controlling variables. The VIF values of the FD and FW variables are larger than other variables. The CI value of the GV variable compared to other variables has the largest value. Both indicators for the greater value indicate that the correlation strength may rise. However, all the controlling variables selected have VIF less than ten and CI less than 30. Both diagnosis analyses indicate that the variables are not improperly correlated with each other. Therefore, collinearity among these variables does not obviously affect the model performance, and all the chosen variables can be used to develop logistic regression models.

Logistic regression models with single controlling variable
To investigate whether a single geological feature model is sufficient to identify the potential for groundwater resources development at a specific site, a one-factor logistic regression model was developed and then validated to assess the model's strengths and weaknesses and its representativeness. The results of the one-factor analysis are summarized in Table 6. The information available in Table 6 includes (1) the coefficients (β 0 and β 1 ) in each developed logistic regression model: when β 1 > 0, it means that the factor is positively correlated with the model  (2) the significance of the model: tested by HL test (significant when P > 0.05) and Omnibus test (significant when P < 0.05); (3) pseudo R 2 : the Nagelkerke R 2 value used for measuring the goodness of a logistic regression model; (4) success rate: representing the correct ratio between the predicted value and the actual value of the model; (5) AUC: indicating the predictive ability of a fitted model. As shown in Table 6, all seven one-factor logistic regression models achieve the model significance based on the Omnibus test. The FD and FW models have better performance from three measures of pseudo R 2 (less than 0.5), success rate (less than 0.8), and AUC (less than 0.9). However, only three one-factor logistic regression models achieve the model significance (P > 0.05) based on the H-L test. Therefore, the one-factor model from the statistical analysis mentioned above is questionable.

Logistic regression models with multiple controlling variables
To make the prediction models proposed in this study more flexible in practical applications, seven factors taking two, three, four, five, six, and seven factors were carried out for factor combinations, and then each combination was subjected to logistic regression analysis. According to the factor selection method mentioned above, a total of 120 developed models were established, and the results are summarized in Table 7. However, not all models are statistically significant. Table 7 lists the number of models that pass the three tests (H-L test, Omnibus test, and Wald test) together for each combination of factor numbers. For example, the two-factor combination has 21 models in the series, 12 of which fail to pass all three tests together. In addition, the passed models were evaluated by three measures of Nagelkerke R 2 , success rate (SR), and AUC. Their results concerning each measure's range and average value are also shown in Table 7. Among these passed models, model evaluation statistics for the best and worst models in different combinations are illustrated in Table 8. The results indicate that the number of factors affects the overall model performance. The overall predictive model performed best when all seven factors were selected, as shown in Fig. 3. The other finding is that the predictive model would improve its performance when controlling factors with fracture characteristics, such as FD, FW, and GV. The corresponding logistic regression equations for all best models are shown in Table 8 are given as follows.

Application of logistic regression model
Based on the above studies in "Logistic regression models with multiple controlling variables" section, these developed logistic regression equations can be used to predict the probability of hydraulic conductivity greater than the threshold K (1 × 10 −6 m/s) at any specific interval along a borehole. However, these equations can only be applied in fractured rocks but the regolith zone. The selection of the logistic  regression model for the prediction relies on the availability of the seven factors data. For example, Eq. (5) can be applied for prediction when RQD. DI, GCD, and LPI are available. Figure 4a, b demonstrates the prediction of the continuous probability of high groundwater potential occurrence (every meter) along the Neihu and Neimoupu borehole, respectively. The probability of high groundwater potential occurrence for the Neihu borehole was predicted between 10 and 100 m below the ground (the thickness of regolith is 9.4 m). The probability of high groundwater potential occurrence for the Neimoupu borehole was predicted between 20 and 100 m below the ground (the thickness of regolith is 20 m). By comparing the predicted probability distribution along two boreholes, groundwater potential in the Neimoupu borehole is better than that in the Neihu borehole. In addition, the potential area of groundwater in the two boreholes mainly comes from the upper half of each borehole. The abovepredicted outcomes can be confirmed by the well yield data conducted from the pumping test in each borehole. As shown in Fig. 4, the well yield at the pumping interval of 9-28 m of Borehole Neihu is 5.8 L/min; the well yield at the pumping interval of 20-39 m of Borehole Neimoupu is 88.3 L/min. Thus, the well yield data agree with the predicted data at the pumping interval of each borehole.
Predicting the probability of hydraulic conductivity greater than the threshold K through these proposed logistic regression models can yield great assistants on engineering practices related to hydrogeological investigations and groundwater resources development. The relevant applications include: (1) the probability distribution data in a borehole that can give information concerning the major zone of groundwater resources or a candidate site for groundwater resources development; (2) the prediction of a specific interval in a borehole that is beneficial to the design of double packer hydraulic test design for arranging testing locations. Finally, the probabilistic outcome derived from the logistic regression model compared with the previous deterministic method (Hsu et al. 2020) provides a more unbiased estimation.   data from the groundwater resources investigation project in the mountainous area of Taiwan were collected. Seven independent variables, including RQD, DI, GCD, LPI, FD, FW, and GV, have been selected to develop a total of 127 logistic regression models that can be used to evaluate rock mass permeability. 77 out of 127 models were statistically significant when validated by the Hosmer-Lemeshow test, Omnibus test, and wald test. The 77 models were evaluated by three measures of Nagelkerke R 2 , SR, and AUC to further understand the prediction performance of each model. 2. The logistic regression analysis was performed by exploring various models of factor combinations from the selected seven factors. For single-factor analysis, the results showed the seven models were not statistically meaningful, so the predictive model was difficult to rely on a single factor to determine the hydraulic property of a specific section along a borehole. The multi-factor combination analysis showed that the more factors were selected, the better the model prediction accuracy. This conclusion is consistent with the previous analysis results using the deterministic method (Hsu 2021). When all seven factors are selected, Nagelkerke R 2 , SR, and AUC can reach the best values, which are 0.83, 91.6%, and 0.976, respectively. 3. The model developed in this study has the characteristics of adapting to different geological environments, and there are many factor combination models, which can be applied according to available factor data obtained in the field. These models can be applied to the investigation of hydrogeological parameters, including (1) assistance in the planning and design of the location of the double packer hydraulic test section; (2) the probability distri-bution results of high permeability characteristics along a borehole that can be used as a tool for selecting high groundwater potential sites.