ML techniques are powerful tools for developing predictive models. ML applies pattern recognition to guide reservoir performance developed on a computer (Mohaghegh, 2011). Furthermore, the predictive models based on ML could generate a rapid and accurate forecast in place of the reservoir simulator. In this study, ML algorithms were adapted to develop predictive models to evaluate the trapping efficiency in deep saline aquifers. The workflow for creating the predictive models is shown in Fig. 1a.
First step: CO2 sequestration model construction. The 3D geological model was considered with a compositional modeling package (CMG-GEM) to simulate the CO2 trapping mechanism. The specific equations of the simulation package are expressed as follows (Nghiem et al., 2004):

“Convection” represents the flow induced by the pressure difference; Darcy’s Law explains this mechanism. The second factor is the rate of diffusion in the liquid state (Kim et al., 2017). The interaction between the reaction and injection results in precipitation and solubility between mineral components and the formation of brine (Kim et al., 2017), are key points in the CO2 sequestration process (Nghiem et al., 2010).
Furthermore, the petrophysical properties and other parameters were referenced from previous studies (Fang et al., 2010; House et al., 2003; Issautier et al., 2013; Jin et al., 2012; Lengler et al., 2010; Mediato et al., 2017; Shogenov et al., 2017; Singh et al., 2010). As illustrated in Fig. 1b, the simulation model included 2660 (19 × 28 ×5) grid cells. The porosity and permeability properties adapt from the PUNQS3 project (Gu et al., 2005). The data set is obtained from the website of Imperial College London (PUNQ S3, 2021).
Moreover, there is a large number of factors that influence CO2 trapping performance in deep saline formations. These factors include saline formation, depth, reservoir thickness, petrophysical properties, residual gas saturation, compressibility, wettability, capillary pressure, injection methods, and anisotropy ratio (Al-Khdheeawi et al., 2018a; Dai et al., 2018; Silva and Ranjith, 2012). Among these factors, determining the most critical parameter is necessary. Therefore, several scholars have conducted a sensitivity analysis to rank the most influential factors in the CO2 sequestration process (Abbaszadeh and Shariatipour, 2018; Gibson-Poole et al., 2006; Lee et al., 2010; Liu and Zhang, 2011). Based on these literature reviews, eight parameters, depth, porosity (Por), permeability (Perm), thickness (h), residual gas saturation (Sgr), salinity, injection rate, and ratio of vertical to horizontal permeability (Kv/Kh) were selected for this study. The geological and uncertainty parameters are presented in Table 1.
Table 1
The uncertainty factors considered for conducting the simulation jobs (Bachu, 2008; Beni et al., 2011; Bu Ali et al., 2011; Dai et al., 2017; Ezeanyim and Shariatipour, 2016; Fang et al., 2010; House et al., 2003; Jia et al., 2018; Singh et al., 2010; Song et al., 2020; Sung et al., 2014; Temitope et al., 2016b; Xiao et al., 2019)
Uncertainty factors
|
Minimum
|
Base case
|
Maximum
|
Units
|
Porosity
|
0.01
|
-
|
0.4
|
-
|
Permeability
|
0.01
|
-
|
2000
|
mD
|
Thickness
|
2
|
5
|
200
|
M
|
Depth
|
800
|
1000
|
3000
|
M
|
Residual Gas Saturation
|
0.1
|
0.4
|
0.5
|
-
|
Salinity
|
10000
|
120000
|
400000
|
ppm
|
Kv/Kh
|
100
|
1500
|
1500
|
-
|
CO2 injection rate
|
39
|
2740
|
5480
|
ton/day
|
In this study, relative permeability properties were obtained from (Vo Thanh et al., 2020a). Moreover, the residual CO2 trapping mechanism in this study was mainly based on land trapping models (Vo Thanh et al., 2020a). Therefore, the drainage and imbibition processes can be calculated during the simulation using Land’s residual model (Land, 1968). Figure 1c depicts the relative permeability properties and the land trapping model used in this research.
To evaluate the efficiency of CO2 trapping performance, three kinds of trapping indices need to be calculated (Nghiem et al., 2009a):

In order to investigate the uncertainty of reservoir heterogeneities, 101 geological realizations were considered during the CO2 injection process. The Petrel package was automatically transferred to the compositional simulation module (CMG-GEM) to conduct the CO2 sequestration process. Then, the simulation result was evaluated by the optimizer (CMOST-AI) before creating the following geological models and uncertainty variables. The new porosity and permeability models were generated in each new simulation job (Vo Thanh et al., 2020b). Figure 2 represents the process for integrating the Petrel into CMOST to conduct simulations on different 101 geological realizations. Table 2 presents geological variables for creating realizations, consisting of azimuth, global seed number in the Petrel package. These variables would change the distribution of porosity and permeability for considering the geological uncertainty effect for machine learning models.
Table 2
The uncertainty limit of parameters for generation petrophysical realizations
Parameter
|
Lower bound
|
Base case
|
Upper bound
|
Azimuth Porosity
|
10
|
12
|
30
|
Azimuth Horizontal Permeability (Kh)
|
5
|
21
|
25
|
Azimuth Vertical Permeability (Kv)
|
30
|
36
|
60
|
Global seed number porosity
|
1000
|
1850
|
3000
|
Global seed number kh
|
650
|
1000
|
3000
|
Global seed number kv
|
20
|
10
|
50
|
Continuous CO2 injection was applied for 10 years, followed by 490 years post-injection for this study. Figure 3 depicts the CO2 saturation profiles for the 3D filter I-K directions after 10, 200, and 500 years. CO2 is injected into the saline formation and rises to the reservoir's top due to the buoyancy mechanism (Nghiem et al., 2010).
Figure 3 shows the CO2 saturation in the reservoir after the 10-year injection phase. The density of CO2-saturated water increases because of CO2 solubility in the aqueous state (Kim et al., 2017). The wet phase then migrates to the lower part of the saline aquifers as a CO2 plume that activates different solubilities of CO2 gas in the wet state. Here, the spread of the vaporous CO2 developed in the top saline is more significant than that in the base supply because CO2 moves vertically (Emad A. Al-Khdheeawi et al., 2017; Al‐Khdheeawi et al., 2017) and is held by the seal rock. These simulation results can further explain the mechanism of CO2 migration during saline formation.
Second step: Define the uncertainty variables. There are many uncertainties in deep saline aquifers because less observation data are available from these reservoirs due to limiting finances in CO2 storage projects. Therefore, these uncertainties can be used to develop ML models to predict the CO2 trapping efficiency during saline formation. The uncertainty parameters are listed in Table 1.
Third step: Latin hypercube sampling To generate the training dataset, 100 simulation experiments were performed using Latin hypercube sampling (LHS) utilizing CMOST, an artificial intelligence and ML tool in the CMG package (CMG, 2019).
This module can perform sensitivity assessment, history matching, optimization, and uncertainty analysis for simulation projects (CMG, 2019). The key point of this step is to employ LHS because it is not dependent on the amount of training simulation jobs from the uncertainty parameters (Vo Thanh et al., 2020c).
Fourth step: Perform simulation jobs to gather inputs/outputs for the machine learning model. This procedure is vital for the development of ML models. The CMG-GEM module was used to perform 101 simulation experiments. For every simulation experiment, the uncertainty parameters and objective interests were gathered as the training dataset; residual trapping, solubility trapping, and cumulative CO2 injection in a deep saline aquifer were the outcomes.
Furthermore, 4900 samples with elapsed time (10, 20, 30,.… 500 years) were prepared from 100 simulation experiments for training the ML models. In addition, 49 samples with the same elapsed time as the training stage from one simulation job was utilized to blind test ML models. These procedures are explained in detail in the final section.
Fifth step: Generate predictive models using Machine Learning Techniques The predictive models can estimate the relationship between the input variables and output functions depending on the training reservoir results. In addition, the developed predictive models were built for each objective interest. To demonstrate the robustness of developed predictive models, three popular and powerful supervised ML algorithms (GP, SVM, and RF) were employed in this study. The detail of ML techniques was introduced in Supplement Material. All the ML techniques were conducted in a MATLAB 2020b environment running on an Intel ® Core ™ i7 -8550U CPU with 16 GB RAM.
Regarding the performance criteria for each ML model, Stazio et al.(2019) suggested that the root mean square error (RMSE) and coefficient of determination (R2) were used as two statistical indicators to assess the developed predictive models from ML techniques. These statistical indicators were computed using the following equation:

Final step: Validation of predictive models. The 100 simulation jobs were used for calibration (training) and 10 fold cross-validation (Geisser, 1993). One simulation job was used for blind testing to evaluate the stability of the predictive models. Subsequently, the selected predictive models were employed in the field data from previous studies. This final step would ensure the application of predictive models in the actual storage sites of the CO2 trapping mechanism and other science disciplines. The MATLAB function of the developed training model for prediction in an existing reservoir is expressed as:
Result = trainedModel.predictFcn(X) where X is the data matrix.