Oil and gas are kinds of non-renewable resources. With the exploitation and consumption of human beings, the speed and accuracy of oil and gas resource exploration are becoming higher and higher. In the exploration of oil and gas resources, lithology identification is the premise of accurately determining rock porosity and oil saturation, and also the basis of studying geological reservoir characteristics, calculating reserves and geological modeling. Therefore, rapid and accurate lithology identification using machine learning methods has become a research topic.
Data-driven machine learning method can effectively mine the complex nonlinear relationship between high-dimensional features. Machine learning has developed rapidly in recent years and has been widely used in many fields, including geosciences (Saporetti et al. 2018, Sun et al. 2019, Saporetti et al. 2019, Asante-Okyere et al. 2020). In lithologic identification, there are also many related studies, such as Adaboost (Han et al. 2021), random forest (RF) (Ao et al. 2019), support vector machine (SVM) (Bressan et al. 2020), and artificial neural networks (Asante-Okyere et al. 2020).
Stacking integration is one of ensemble learning algorithms that uses a parallel learning approach and an untyped algorithm (called "primary learner") to obtain the initial prediction values and a meta-learner to further optimize the initial prediction values to obtain the final prediction results. In the literature (Liu et al., 2020), a load prediction method based on a multi-model fusion Stacking ensemble learning approach is proposed, using long short-term memory (LSTM), gradient decision tree, RF, and SVM as primary learners, and then the results of the primary learners are further optimized by a meta-learner. The method makes full use of the advantages of each model and has good prediction results for conventional loads.
At present, the research of intelligent lithologic identification focuses on improving the accuracy of the model,, while the predict results of the models lack sufficient interpretation. More accurate machine learning algorithms are less interpretable (Ibrahim et al., 2019), which limits the progress of the identification model in lithology identification. In order to improve the interpretability of machine learning algorithms and increase the reliability of identification model, some interpretive algorithms have emerged in recent years. For example, SHAP based on coronary heart disease mortality prediction (Wang et al., 2021), applied interpretable machine learning to estimate crop yields (Mateo-Sanchis et al., 2021), and the LIME based on traffic safety interpretability study (Das et al., 2021).
Based on the previous research results, the ensemble learning model represented by random forest, support vector machine and stacking algorithm is widely adopted in lithology identification. In this paper, in order to further verify the generalization ability of the ensemble learning model in lithology recognition and prediction evaluation and improve the interpretability of the model, In the Council Grove gas reserve located in Kansas, USA (Bohling and Dubois, 2003; Dubois et al, 2007), and Daqing Oilfield, in China, two public logging data sets as an example, With SVM, RF, and naive bayes (NB) as primary learners, SVM as a secondary learner, Classification prediction of lithology is made by Stacking. Precision, recall, F1-score, Area Under Curve (AUC) were verified, and the interpretability of the identification model was studied by two explanatory algorithms: PI and LIME.