A Visual Analytics Framework from Geological Modeling to Reservoir Simulation


 Immersive technologies such as virtual reality has shown great potential in to enhance many important workflows the oil and gas industry. Immersive technologies have many natural advantages. These include enhanced spatial perception of 3D data, spatial user interfaces and interactions, the ability to provide collaboration space and incorporate analysis techniques. To take advantage of these technologies in the reservoir engineering domain, a visual analytics framework is established to demonstrate how these technologies may be used effectively. This framework allows a user to progress from geological modeling to reservoir simulation in an interactive and highly effective application. First, geological uncertainty analysis is conducted to screen representative realizations as candidates for simulation. Visualization and analysis of these representative realizations may then be performed in virtual reality using reservoir connectivity analysis of a tight oil reservoir. This application supports natural interactions, improved working space, and effective perception of underground connectivity. For this reason, it is more convenient and natural to work between a reservoir scale model and a set of candidate local realizations. This platform provides a basis for future data analysis methods and ways of interacting and visualizing data that support this analysis.


Introduction
Immersive technologies come in a variety of forms. Although all immersive technologies hold much promise, due to the relative ease of implementation and ability to visualize large and computationally challenging datasets, virtual reality (VR) was selected as a starting point and is the technology used in this project [1] . This technology has been applied to various fields such as education, entertainment, and design [2,3,4] . In the oil and gas industry, VR technology has already shown great value by helping experts to make more accurate decisions [5] , improving the execution of onshore and offshore engineering projects [6,7] , delivering interactive engineering education and training (CEET) programs [8] , and visualizing enhanced oil recovery processes at nanoscale [9] . In this research, a visual analytics framework is designed for a VR implementation of reservoir simulation models. To conduct reservoir connectivity analysis (RCA) in VR, a user interface (UI) system has been designed. To obtain more reliable RCA results, geological uncertainty analysis is added to the workflow.
Subsurface geological models are typically built using well log and seismic data. High quality data from the subsurface is quite sparse, as seismic provides only indirect measurement that must be interpreted carefully, and surface outcrops only provide a partial picture of complex geology that can span a large area and vast depth. For these reasons, geological models, although crucial for decision making, are rife with uncertainty. A distribution of a petrophysical property in a geological formation can be regarded as only a probabilistic guess, since we never know an exact value of a petrophysical property at a given location (e.g., porosity, permeability, and initial water saturation). Visualizing one realization does not convey uncertainty [10] . Usually, geostatistical methods are used to construct geological models that reproduce the different interpretations and characteristics of a variable of interest throughout the spatial region considered [11] . Applying a probabilistic approach, this technique generates multiple realizations to cover the various possibilities. To quantify uncertainty and select representative models (RMs), a simple and widely used ranking measure is the original oil in place (OOIP) for each realization, corresponding to P90, P50 and P10 models (cumulative probabilities 90%, 50%, and 10%, respectively) [12]- [16] . Also, other static reservoir properties (e.g., porosity and facies proportions) and dynamic properties (e.g., connectivity and flow response) can be applied to order realizations [10] . However, due to the lack of physical geometrical information, selecting a few realizations based simply on OOIP or other properties will not provide a representative model for each scenario that is possible, due to the underground uncertainty [17,18,19] . For example, a realization with neither horizontally nor vertically connected pores could have the same pore volume as one with connected pores, though their petrophysical similarity is low [20] . In other words, selecting too few realizations may result in misleading forecasts [16] and extraction plans based on incorrect assumptions. This will result in large differences in the expected economics and environmental impact of a project and may lead to project failure.
Determining the likely outcome using each of the possible representative realizations is very important for planning a petroleum extraction project. Although these sophisticated simulations can be very time consuming, even very large and complex models may be effectively simulated in useful timeframes using parallel computing [21,22,23] . Reservoir simulation on massively parallel computers can reduce simulation time over three orders of magnitude. Even with fast computers and many processors and the best parallel simulators, in the best case it is wasteful of expensive computer systems and in the worst case completely impossible to perform reservoir simulation on all generated realizations, which may be very complex and may number in the thousands. Furthermore, many reservoir engineers may simply not have access to either large scale compute infrastructure or the software to make effective use of it. A far more effective approach is to find the RMs from the generated realizations that are important to the simulation outcomes [20] . RM selection methodologies can be divided into production-based strategies and clustering-based methods [24] . Since selecting RMs based on a specific production-based strategy may lead only to a limited set of results constrained by user defined wells and field objective functions, we select RMs based on a clustering method to create a universal workflow that can find all RMs that must be considered for a complete understanding of possible scenarios. Clustering analysis has been widely integrated into many different analysis workflows to develop a realization reduction methodology [10,20,24,25,26] . Virtually all these clustering methods are based on distance [27] . To describe the distance/similarity among realizations, two or more properties may be weighted and combined to evaluate the distance. The other method is to adapt the concept of a flow unit, which is a composite value dependent on porosity and permeability, allowing only this parameter to be considered during clustering while still capturing the effect of these properties. This concept has been used in the past in "conventional" carbonate and siliciclastic reservoirs, and more recently, in tight and shale gas reservoirs [28,29] . Dr. Aguilera [29] extended this same concept quantitatively to tight and shale oil reservoirs with the use of real data.
This paper proposes a visual analytic framework from reservoir modelling to simulation. This framework can be divided into two parts: (1) geological modelling and uncertainty exploration and (2) VR implementation of representative realizations.
To conduct uncertainty analysis, multiple realizations are generated using from Gaussian random function simulation. A clustering method is then used to calculate the distance among realizations. Each three-dimensional (3D) realization is transformed into a one-dimensional (1D) array. Thus, a matrix consisting of all arrays can be built by referring to all realizations. Each column represents a single realization, and each row represents the same grid cell in different realizations. Defining a spatial distance function, a distance matrix can be formed to show the differences calculated for all realizations. Then, a multidimensional scaling (MDS) approach is introduced to help interpret and visualize the results. Unsupervised learning algorithms are used to cluster realizations into several groups and representative realizations can be determined. The realizations closest to the centroids of each group are selected to be representative realizations. Then, P90, P50 and P10 models calculated based on reserve volumes are created.
A visual analytics tool has been developed to support the proposed workflow and provides visual and interactive tasks for uncertainty exploration and RCA. To visualize reservoir datasets, the Visualization Toolkit (VTK), an open-source software system for 3D computer graphics, image processing and scientific visualization, is used to render the data. Unity 3D is a crossplatform game engine and is used as the primary development platform to provide the framework for the VTK-based reservoir visualization system and the UI. When converting reservoir models to vertex-edge graphs, various graph theory algorithms can be applied to support data analysis within this visual analysis application.
In the following sections, we will first review methodologies used in this research: geostatistical simulation, distance of realizations, cluster analysis, data visualization and graph theory. Then we will discuss the application of the new workflow. Finally, conclusions are presented, and future work is outlined.

Geostatistical simulation
Since knowledge of the geologic structure of a reservoir is limited at the decision-making phase of a reservoir's development planning, geostatistical modelling offers an opportunity to model a distribution of geologic properties across the reservoir [30] . Multiple equally probable realizations are generated using geostatistical simulation techniques such as Gaussian random function simulation to characterize the geological heterogeneity and uncertainty [16] .

Flow units
According to Dr. Aguilera [31] , a flow unit is defined as a stratigraphically continuous reservoir subdivision characterized by a similar pore throat type [32] . Use of a flow unit provides the ability to simultaneously consider permeability and porosity in one parameter, and thus is a useful concept for linking geology, petrophysics and reservoir engineering [33] .
A process speed (i.e., the ratio of permeability to porosity) provides a relative indication of storage and how quickly fluids can move through porous media, which is related directly to flow units [33] . Based on mercury intrusion experiments, Winland found that there is a strong correlation between a process speed and a size of pore throats at 35% nonwetting phase saturation as determined from mercury injection capillary pressure tests ( 35 ) [34] . The other 65% contributes primarily to storage capacity rather than deliverability [35] . Incorporating capillary pressure, pore throat aperture radii, height above free-water table, and Winland 35 values into Pickett plots, Dr. Aguilera [36] developed 35 , which allows calculation of the pore throat aperture at any water saturation of interest: where is permeability, mD and is porosity, %.

Distance of realizations
A 3D realization consists of cells. Using the Euclidean coordinate system, each cell is indexed by [ , , ] and has flow unit information and geometrical location of data [36] . To calculate the distance between realizations, each 3D realization is reshaped as a 1D array. Setting a realization scale as × × and the cell index in the 1D array starting from 0, the corresponding index of a cell is In petroleum applications, the realizations can be ordered according to static and dynamical reservoir properties [10] . Two or more properties may be combined to evaluate the distance between two realizations, but each property must be weighted [10] : and ′ denote the cell values of realizations and ′ computed for each grid cell m; is the distance of a property; 1 , 2 ,⋯, are the respective weights for each property. In our case, since flow units are used to characterize pore throat types, at this step only equation (2) is needed. and ′ are the 35 of realizations and ′ computed for each grid cell in each 1D array. A distance matrix consisting of all realization arrays can be built by applying this method to all realizations.

Clustering analysis
Different methods such as K-means, K-medoids, hierarchical clustering and model-based clustering have been used to select RMs [30,37] . In this project, K-means and Gaussian Mixture Model (GMM) clustering algorithms are tested. The realizations closest to the centroids obtained from each clustering algorithm are considered as RMs.
The K-means algorithm groups data points by using the distance from a cluster centroid [38] . It is widely used in scientific and industrial applications due to its simplicity and speed [38] . However, K-means uses Euclidean distance as the similarity measure which limits identification of nonlinear usage structures. For this reason, GMM is used for complementation.
GMM is a continuous probabilistic model that represents a dataset with a weighted combination of several normal distributions called mixture components [39] , given by [38] Ν where is a D-dimensional mean vector, ∑ is a × covariance matrix, which describes the shape of the Gaussian, and |∑| denotes the determinant of ∑.
GMM performs clustering of unlabeled data in much the same way as K-means, however, use of the mean and the covariance, rather than only the mean as in K-means, gives GMMs the ability to provide a better quantitative measure of fitness per number of clusters [38] .

Visualization of reservoir data
As shown in Figure 1, the data visualization process may be considered a pipeline consisting of four stages: data importing, data filtering and enrichment, data mapping, and data rendering [38] . [40] VTK is a widely used software system for data processing and visualization. It supports multiple data objects, including linear primitives, structured data, unstructured data, and images. Numerical reservoir models are often stored as unstructured data.

Figure 1. Functional view of the VTK visualization pipeline
To develop a VR application, Unity was selected as the development platform, utilizing the C# programming language. Although VTK is not available natively in C#, Kitware Europe provides ActiViz, a C# wrapper for the native VTK libraries. To easily integrate 3D visualization using ActiViz in Unity, the official VTK ActiViz Unity plugin was used.

Graph theory
Graphs are mathematical structures used to model pairwise relationships between objects. When represented as a weighted graph, a variety of relationships within a reservoir model can be analyzed by using graph theory.

Weighted graph conversion
A numerical reservoir model can be converted from a cell-based model to a vertex-edge graph representation. In a graph, nodes represent a cell, and attributes of each cell can be transferred to nodes. The cost of edges can be defined as needed.
Our goal is to find the path of least resistance in a reservoir, which can be called the "shortest path" with an edge weighting comprised of flow rates and resistance to flow for each fluid phase [41,42] . The flow rate for a specific phase in an edge weighting may include transmissibility, phase mobility and phase potential difference [42,43] .
Transmissibility in reservoir simulation is a property which characterizes the ease of flow in connections between two adjacent grid cells [44] . It depends on permeability and grid geometry [44] . In our case, the reservoir connectivity (weight ) is defined as: where and represent adjacent cells; is the volume of a cell; represents water saturation; and , refers to transmissibility between cells and . The inverse of weight w is also a quantity of resistance called "delta time" and is also associated with each connected node [44] .

Shortest path algorithm
A shortest path algorithm can be adapted to utilize connectivity from perforations to formations. Some works have adopted a single source shortest path algorithm [44,45,46] such as Dijkstra's algorithm and Bellman-Ford's algorithm. For primary recovery scenarios, the single source algorithm works perfectly. In these cases a one-way connectivity problem from an injection well to a production well is considered, and recovery optimization is determined using injection rates [45] . For secondary recovery scenarios, additional injection wells may be introduced, and graph algorithms should consider each injector perforation as well. Graph algorithms can show the possibility that whether injecting and producing wells are connected, and characterize the connection by identifying the minimum cost among all producer perforations that are connected.

Graph filtering
After applying Dijkstra's shortest path algorithm, both injection wells and production wells have a connection relationship within a formation. Users may restrict a connectivity network to a single perforation. By comparing connectivity networks of different perforations, users can intuitively judge the relative reservoir contact area of different perforations. The connectivity among well pairs or well groups can also be easily displayed.

VR visualization system
A VR visualization system is one which provides a real-time viewer-centered headtracking perspective with a large angle of view, interactive control, and binocular display [47] . In this VR application, data analysis can be conducted in an efficient and interactive manner. Specifically, this platform provides the following benefits:

Enhanced data analysis
A reservoir model contains multiscale, multidisciplinary data, such as geological data and reservoir engineering data from the core scale to the field scale, and the viewing of this data in an immersive environment can facilitate more effective analysis of the 3D data [48,49,50] . By providing clear analysis of depth and the ability to clearly discern spatial features, workflows involving complex spatial data are made more efficient.

Multidisciplinary collaboration
Enabled with enhanced data analysis, users from different professions can also collaborate and discuss the data in a variety of ways using a variety of different or similar views as needed and collaborate over abstract datasets as if they were working on it together in the same room, except they are able to apply any number of analysis tools using the virtual aspect of VR. Largescale visualization centres may support co-located collaboration, and may be combined with head mounted displays to allow an integrated team consisting of professionals such as geophysicist, geologist, reservoir engineer and drilling engineer to view and work with the same data at the same time no matter where they are physically located [51,52] .

Large Working Space
The large working space provided by immersive technologies provides what Dr. Doug Bowman has described as "immersive space to think" [53] . Practically speaking, the larger working space in VR allows for many analysis tools, realizations, and datasets to be considered simultaneously, rather than in a sequence dictated by the limitations of screen space when using 2D screens [54,55] . This enables data "workshops" where many different views and analysis steps can be utilized relatively effortlessly.

Improved interaction fidelity
Improved human-computer interaction is a major benefit of virtual reality. In addition to improved visual fidelity, interaction fidelity is improved using spatial user interfaces such as controllers, gestures, or a variety of other interaction methods that would not be usable on conventional 2D mouse-and-keyboard system. The equipment can also capture data about the user in a virtual environment [56,57] , providing insight into their actions, and optimizing their workflow in future sessions.
The UI in this project supports basic manipulation of reservoir models to assist with RCA. UI elements such as a property selection dropdown menu, a transparency adjustment slider to change the transparency of the cell rendering, display mode adjustment and a weight threshold slider, help the user with analysis. By adjusting the transparency slider, the user can get a clear view of the connectivity by changing transparency of occluding cells. By changing the weight threshold slider, the user can analyze realizations get connectivity networks from defined perforations.

Overview of a tight oil reservoir
As shown in Figure 2, the pilot area of the tight oil reservoir named Y284 is circled in blue boundaries, including directional wells and horizontal wells. The Y284 tight oil reservoir is located in the Changqing oilfield in Ordos Basin China and is used as an example data set in this project. By the end of 2017, the Y284 Unit [58] had 756 producers and their main producing layers lie in the third sub-member of the sixth member of the upper Triassic Yanchang formation also known as Chang 63 (C63), which contains green-greyish siltstones with dark grey mud intercalation [59] . The average air permeability without confining pressure is estimated as 0.38 mD. The main producing interval is C63 1 , which is a continuous sand body at the top of C63 with the average thickness as 21.2 m, porosity as 12.1% and initial oil saturation as 55% [60] .
In this study, the pilot area of directional wells in the C63 1 formation is selected as the study case (Figure 2), which is located at the edge of the reservoir. Due to its uneconomic production and location, it is selected to be a pilot area to perform refracturing experiments after 3-5 years production. The geological model consists of C63 1-1 , C63 1-2 and C63 1-3 , and each sub-layer is further divided into three sub-layers.

Figure 2. Boundaries of the pilot area within the Y284 Unit (study case is in the black boundary) Geostatistical model generation
Based on two well logs within the pilot area, a thousand realizations and the corresponding sets of properties (porosity, permeability and water saturation) are generated using Gaussian random function simulation. The distribution of stock-tank oil initially in place (STOIIP) across the realizations are shown in Figure 3. Thus, P90, P50 and P10 can be determined. a b c Figure 3. A thousand realization datasets generated by Gaussian random function simulation

Geostatistical model clustering
According to work done by Dr. Aguilera [36] , flow units can describe pore throat apertures in a tight oil reservoir by 35 . The thousand sets of 3D realizations are transformed to 1D arrays, in which each flow unit value is placed into a grid cell that represents the spatial location information. Euclidean distance (equation (2)) is used to obtain similarity distances. A distance matrix with 1,000 rows × 1,000 columns can then be created. Using MDS, a distribution of realizations in the Euclidean space is shown in Figure 4 and 5. P90, P50 and P10 models are labeled in the chart.
Hierarchical clustering is used to group the data and generate nested clusters in a hierarchical format, allowing users to create unique and deterministic clusters with different shapes and size for a specified linkage function [61] . It does not provide cluster centroids, however, which are needed to select representative realizations for visualization. Therefore, K-means is used alongside GMM to find the centroids.
To determine an optimal number of clusters, the elbow method is used by minimizing a Sum of Squared Errors (SSE) for K-means. The Akaike Information Criterion (AIC) and the Bayesian Information criterion (BIC) are analytic methods that estimate the goodness-of-fit of statistical models relative to each other for a given data set [38] , which is used to determine an optimal number of components for GMM. As seen in Figures 4 and 5, the optimal number of clusters are 3 for K-means and 2 for GMM clustering. One of cluster centroids is similar for both clustering methods, and is also close to the P90 model. Therefore, besides P90, P50 and P10, the centroids of K-means clusters 2 and 3 (blue and pink spots in Figure 4) and GMM cluster 1 (yellow spots in Figure 5) are selected as representative realizations.
a Clustering with K-means b Elbow Method for K-means

VR implementation of reservoir models Rendering VTK files using Activiz
VTK is a powerful visualization software. For VTK and VTU files, this pipeline is straightforward. When using other formats such as output format from a CMG (Computing Modelling Group) simulator. A proprietary reader is used to extract data from these files and transfer to vtkDataSetMapper. The vtkUnstructredGridReader is skipped and data is sent on the fly to a mapper. Inactive cells are supported in some formats and must be considered to properly represent grid arrangement. All physical properties (such as permeability, porosity and saturation) are only valid for active cells.

RCA using graph theory
The classical Dijkstra's algorithm is to find shortest paths between nodes in a 2D graph. To simulate connectivity networks from perforations in reservoir models, the algorithm is extended to 3D shortest path tree ( Figure 6). Weight calculations for adjacent cells are as follows: Figure 6. VR scene of a 3D connectivity network generated inside a reservoir model (the red ball represents a selected perforation; by adjusting weight threshold, the connectivity network expands to a larger horizontal and vertical range) By defining multiple source nodes at different perforations of the same well or different wells, connectivity networks can be generated for as needed. For example, to evaluate the reliability of representative realizations, multiple graphs are generated in these realizations. Five wells, Y307-54, Y308-51, Y308-53, Y309-53, and Y310-52 are selected to test the transmissibility distribution of representative realizations. Y308-51 and Y308-53 are the injectors of the selected wells. Figure 7 highlights the difference among realizations, since the generated connectivity graphs of the five wells show different connectivity at each local region. The difference between regions is bigger in realizations (b)-(f), while in realization (a), the graphs show connectivity regions that are more similar. Modifications of local permeability and saturation distributions can be used as adjustments to examine how these changes affect outcomes, and these changes can be reflected in the graphs. An ideal local distribution in a certain realization can be copied to others. The P50 model (Figure 7 (f)) is typically favoured realization in previous analysis, but in our case using this tool, it shows that very heterogeneous, local connectivity networks of Y308-51 and Y308-52 quickly spread while others spread very slowly. So far, a realization which is more balanced in local connectivity among wells would be a better choice. In our case, to have a superior model for further analysis and modification, realization (a) is chosen as the main model, while local connectivity distributions of others act as a reference for consideration when planning. We already have a basic understanding of Y308-53 and Y309-53 from analysis of sub-layer connectivity. These two wells were put into production around 2010. As shown in Figure 8, after two years injector Y308-53 showed an imbalanced water injection profile, most injection water went from the second perforation in C63 1-2 and C63 1-3 layers, and the interpretation of its tracer log showed a connection between Y308-53 and Y309-53. Thus, there was a great likelihood of inter-well connection in C63 1-3 . By generating connectivity networks from perforations in C63 1-3 , the inter-well connectivity in realization (a) can be evaluated. Figure 9 shows connectivity networks of the wells. Injector Y308-53 had a better connectivity with formations and showed a possibility to connect with Y309-53 at the same weight threshold, which also matches the large tracer distribution in Figure 9. While there are a lot of realizations with a high degree of heterogeneity, it may lead to bad results if these realizations are directly put into simulation.   VR provides a clear perception of fluid flow connectivity at a glance due to the improved perception of both the reservoir and the graphs within the context of the reservoir. It is also more convenient and natural to move between a reservoir-scale model and a local realization group due to the higher interaction fidelity. When scenarios reflected by realizations do not match our understanding from well tests, it is easy to correct our model. Modifications can happen on distributions of permeability and water saturation, helping reservoir engineers build more reliable models for further simulation.

Conclusions
A visual analytics framework was built to support analysis from geological modeling to reservoir simulation. In this framework, geostatistical simulation and distance-based cluster analysis are used to screen representative realizations from one thousand realizations. For data visualization, VTK was utilized to render the static reservoir data visualization. The Unity game engine together with VTK enabled an immersive application with a high degree of visual and interaction fidelity for data manipulation, data analysis and efficient collaboration between multidisciplinary experts. Using this immersive application, RCA is performed on the data to assist with the analysis. The cell-based reservoir model was converted to a vertex-edge graph, and Dijkstra's shortest path algorithm is applied and modified to simulate 3D connectivity networks.
The pilot area in the Y284 tight oil reservoir was studied as a field case to test the framework. A large amount of heterogeneity in the connectivity of realizations was found. When considering realizations selected by cluster analysis, a P50 model does not show a strong representation in a distribution of realizations in Euclidean space. Thus, it was necessary to propose a solution to screen realizations. Cluster analysis was used with flow units as variables to identify homogenous groups of cases. A relatively balanced realization is sought as the main model for which to perform planning. With the help of well test analysis and RCA, the inter-well connectivity of Y308-53 and Y309-53 was studied, and a representative model was selected as the primary model. Most of the other representative realizations showed a great amount of heterogeneity in connectivity, they are regarded as a complimentary scenarios to provide an understanding of possible local connectivity distributions.
Immersive applications utilizing technologies with a higher degree of interaction and visual fidelity than conventional 2D screens, mouse and keyboard help us obtain an intuitive understanding about underground connectivity, and provides a the ability to move between reservoir-scale views and local views of the realization. Following the visual analytics framework exemplified in this work, not only RCA but also many other data analysis techniques may be incorporated, such as various types of cluster analysis, which can be used to help capture spatial and spatial-temporal features of reservoir data and discover valuable insights.
In the future, this application will be extended to support simultaneous collaboration between multiple users. This will enable real-time collaboration with geologists and other team members. A human-centered collaboration design will be developed to provide natural interactions that leverage the benefits of immersive platforms. Ultimately, this application is expected to evolve into an advanced platform for highly integrated, interactive, and collaborative data analysis and planning for reservoir engineering.