Supervised Learning in image enhance from electron microscopy inversion results

—In Nondestructive testing there is a variety of appli- cations in Material Science, where the specimen is imaged by an Electron Microscope and then by image inversion, information is extraced for the material interior. This type of information might contain noise either by the imaging procedure or by the numerical part of the inversion. We present a method that can improve the interior density results of an inversed material from a series of Scanning Electron Microscope (SEM) images. For this method, the material density can contain some discontinuity, such as regions where it is dense and regions where there are voids. The proposed method directly stands on the Bayesian learning framework, adopting Gaussian Stochastic Processes (GSPs). Two test sample cases that contain some discontinuities in the density are tested. We also provide a comparison between two different GSP modelling approaches; one is a typical GSP and the other accounts for discontinuity, by introducing hyperparameters. The GSP method gives reconstructed data in reasonable agreement with the known original density distribution, giving conﬁdence that the method can be applied to experimentally obtained SEM images.


I. INTRODUCTION
Scanning Electron Microscopy (SEM) is a standard tool to study the nanoscale surface structure of materials. The SEM image results from the interactions of an electron beam with the atoms at various depths within a study sample. Once the electron beam penetrates the sample surface, it produces signals from its interactions with the atoms beneath the incident point, a three-dimensional region, namely the interaction volume. An SEM image produced from the detected signals is thus a projection of the three-dimensional (3D) interaction volume onto a two-dimensional (2D) image. If the projection process can be inverted, taking the 2D image back into a 3D space, the inversion process is called image inversion [3], [8], [17], [19] and has the potential to reconstruct the interior material density of the sample. Image inversion techniques are of great use for nondestructive inspection, a crucial task in materials science; for example, device engineers need such nanostructure information for electronic devices [9].
Recent research [6] proposed such an image inversion technique that produces a discretised form of interior material Loughborough University densities from SEM images, adopting a Bayesian approach. The technique dissects a sample material into voxels, each of which corresponds to a pixel of an SEM image, and then obtains a density value for each voxel. Although the voxelbased discretisation offers a practical approach, it faces some shortcomings, such as resolution issues. For example, if a voxel happens to contain both regions where it is dense and other regions where there are voids, the material density of the voxel as a whole would then be averaged out. In other words, the calculation of voxel-based material density largely depends on the extent to which these voxels are placed. Since material densities are, in general, discontinuous with voids that randomly appear in the interior of the material, constructing an unknown material structure in a non-discretised manner becomes a crucial task. However, up to our knowledge, such an approach has been little exploited in practice.
In this paper, we provide for the first time the methodological framework to construct non-discretised unknown material structures based on the information conveyed from the image inversion. Our approach is based on the image inversion technique [6] that calculates from SEM images the material density value for the centre of each voxel. A key idea here is to utilise Gaussian Stochastic Processes (GSPs) to predict the density value at an arbitrary location within the material sample, given those density values from the image inversion technique. In addition, we take advantage of the inversion variance in GSP modelling, in order for the GSP to modify the density in cases of inversion noise.
To test our framework, first, we model a material with a known discontinuous density function and simulate the corresponding SEM images-the projection of the density. Second, we then invert the SEM images to a 3D space-the image inversion-and reproduce the material density function from it, using GSPs. We demonstrate how our methodology works through two examples; one is a simple case with one void region therein, and the other is a highly discontinuous material which is an extreme case. We compare the results with the true situations to evaluate the efficiency of the proposed method.
The paper is laid out as follows. In the next section, we describe how we simulate two example materials and generate their SEM image data. This projection process is not necessary for real-world situations, as we usually have SEM images first, but is necessary for method evaluation purposes. The third section delineates how our GSP modelling approach accommodates the image inversion method for predicting the interior material density of arbitrary locations. A particular focus is given upon two different parameterisation approaches in GSP modelling. In the fourth section, we evaluate the performance of our methodology, comparing the predicted material density value to the true one, followed by discussion and conclusion.

II. MATERIALS
As illustrative examples, we here present two simulated material samples, whose true interior density is thus known, and create their SEM images. In practice, this projection process is unnecessary as we usually have the SEM images of an actual material sample first. However, to evaluate the performance of the methodology, we numerically mimic the projection process and create SEM images, adopting a convolution technique with a microscopy correction function [10], [16].
Here, the interior of a material sample is specified in Cartesian coordinates s = (x, y, z) ∈ R 3 . The x-y surface is orthogonal to the incoming electron beam, and the z-axis represents the depth in the interior of the sample (Fig. 1). Note that we treat z as an absolute depth from the topmost surface throughout the paper. The material density is then specified as a function of location s as d(s).
In practice, an SEM image consists of pixels, the positions of which correspond to the coordinates of the voxels dissecting the sample material. The voxel size on the x-y surface is thus the same as the pixels of the image. The discretised material density of each voxel, i = 1, 2, . . . , n p , in each depth, z (m) , is then specified as Note that the depth is defined as the length at which the mlowest energy beam reaches (z (0) := 0 and m = 1, 2, . . . , n d ).
In other words, the voxel length along the z-axis should equal to the difference in depth reached by two sequential energy electron beams. For example, the first x-y layer of voxels should expand from z = 0 (the topmost surface of the material sample) and reach up to the length the electron beam reached in the SEM image produced with the lowest energy; the second layer should expand from z equal to the depth the electron beam with the lowest energy reached, up to the depth the electron beam with the second-lowest energy reached. The actual depth in z-axis can be approximated by the following formula [13], where d is the material mass density, Z is the atomic number of the material, E is the energy of the electron beam, and A is the atomic weight. The electron beam with higher energy penetrates deeper and cover a wider hemispherical region inside the material (Fig. 1). In other words, the image created by a higher energy beam (kV) covers a larger region that encompasses ones created by the lower energy beams. An SEM image can be created as the projection of the convolution of density d with kernel k [6], namely the mi-croscopy correction function [10], [16]. A pixel (x i , y i , z (m) ) of an image is then given as where the microscopy function is given in the form of folded normal distribution [15]: where Q, c and v are parameters that specify the microscopy function. [6] suggests three different numerical methods for the projection. The choice of these methods largely depends on the resolution of the imaging technique. In this paper, the simplest option is chosen.
, with some noise in practice [23] so that each pixel becomes, In the following examples, we assume Gaussian noise whose standard deviation equals to pI , where p is a percentage associated with the acquisition time between images [23]. For our applications, p is 5%, which is a reasonable choice-upper bound-when there is at least 50 seconds gap between the acquisition of sequential images.

A. Sample 1: simple density function
We consider a size of material slab: that contains a void so that its material density is where V = s : is an ellipsoid within which the material density is zero (void).
We assume that the sample slab consists of 10 × 10 × 10 voxels, each of which is where m is the enumerated location of each x-y layer of voxels from top to bottom, keeping the situation close to a real application. We then produce an SEM image by projection (3) with the microscopy correction function k (4) whose parameters are Q = 0.2, c = 0.5 and v = 0.1. The choice of k here resembles an SEM correction function often used in practice.
We will evaluate the performance of our methodology by learning the density at 6 × 6 different coordinates,

B. Sample 2: highly discontinuous density function
We define a material: the extent of which represents a size often used in actual applications. The material density function, d, here is a positive discontinuous function studied by [6], where the values R, A, D and B are location dependent random components sampled from a uniform distribution, Uni[·, ·]; they are specifically, The multiplier 18, inside the floor function of R, is associated with the constant 20 in the denominator of the fraction (8). The combination of these two values assures the density value lower than 1 but not too close to 0, while U is not close to 0. Note that random variables A(x, y), B(x, y) and D(x, y) only depend on location (x, y), e.g. A(x, y, z) = A(x, y, z ′ ) for z = z ′ . Regarding the range of B, 2.66 = 2×1.33 is associated with the distance over which the fixed values change.
The density function (8) is continuous over specific intervals but also makes some jumps between voxels. Such discontinuity describes voids and happens mainly due to the randomness in R, making the density of some locations very close to 0, independent of the coordinates. The formulation chosen here can generate a different discontinuous density each time, which fact allows in principle to simulate a wide range of different materials.
We assume that the material consists of 15 × 5 × 18 voxels, each of which is 1.
where m is the order number of each x-y layer from the top to bottom surfaces, likewise the sampled values in (8). In total, we sampled the random values of the function (8), for 15×5×18 different locations. An SEM image is then obtained by projection (3) with the microscopy correction function k (4) whose parameters are Q = 0.2, c = 0.4 and v = 0.15.
We will evaluate the performance of our methodology by learning the density of the x-z slice at y = −7.2 as an illustrative example.

III. METHODS
Once we obtain the material density estimate at the centre of each voxel from SEM images, using the image inversion technique, we can estimate the interior material density at arbitrary locations though GPS modelling.

A. Image inversion
} with some noises (5), we can estimate the discretised material density for the voxels, where, the likelihood is specified as with known microscopy correction function k (3). The prior distribution π(d) is assumed to be an independent halfnormal distribution to make sure the material density to be positive. Although uniform priors could be an option here, it may cause a misspecification problem for cases in which substantial amounts of voxels (1,000 or more) are involved in the numerical computation.
The parameters for the half-normal independent priors were assigned after experimenting under uniform priors. For all the variables, the priors' parameters were the same. Since the mean was given as a highly likely value from the sample of every variable and as variance a value for which the 99% interval was higher than the maximum interval noticed under uniform priors.
Applying Metropolis-Hastings algorithm to (9), we can numerically yield the posterior distribution P (d|I), from which we can calculate the expected discretised material density for the centre of each voxel,d(x i , y i , z (m) ). By marginalising the posterior distribution, we have a posterior mean for each voxel, where d −i is a vector omitting d ) 2 , can also be calculated from the posterior distribution. Eq. 11 is the output obtained from the image inversion technique and will be used in the following subsection to estimate the material density for in-between locations.

B. Gaussian stochastic process (GSP) modelling
Once the expected discretised material density values,d = {d (m) i }, are obtained via the image inversion, we can estimate the interior material density for arbitrary in-between locations, say d * = {d (m) * i }, using GSPs. Without loss of generality, we hereafter regard that the material density is standardised by the average and standard deviation ofd.
With locations {s (m) i } i,m in a 3D space and length scale parameters (lsp), l l l = (l 1 , l 2 , l 3 ), the GSP modelling framework assumes thatd and d * respectively follow a multivariate normal distribution, P (d|l l l) = N (0 0 0ñ, C(l l l)), (12) P (d * |l l l) = N (0 0 0 n * , C * (l l l)), with the dimensionñ = dim(d) and n * = dim(d * ). Their covariance matrices C(l l l) and C * (l l l) are specified by the locations and lsp, i.e. where, Note that the specification above is exactly the same for covariance matrix C * (l l l).
For making an inference, it is convenient to deal with (12) and (13) simultaneously. Consider a combined vector of material density values It is then also follows a multivariate normal distribution whose dimension is n =ñ + n * , with the covariance matrix in the block matrix form, Here, we now develop two different estimation approaches for material density at arbitrary locations, d * . The inference stands on the Bayesian framework, P (d * , l l l|d) = P (d * |d, l l l)P (l l l|d) ∝ P (d * |d, l l l)P (d|l l l)π(l l l) = P (d * ,d|l l l)π(l l l) (18) Note that the common multiplier to both the sides, P (d * |d, l l l), is the posterior predictive distribution.

1) Fixed length scale parameters:
The approach estimates d * , adopting a point estimate for lsp parameters l l l, i.e. the maximum a posteriori (MAP) probability estimate that equals the mode of the posterior probability, from (18), P (l l l|d) ∝ P (d|l l l)π(l l l).
Here, we assume an independent uniform distribution for the prior distribution π(l l l). The MAP estimates are obtained by Metropolis-Hastings algorithm, the proposed distribution for which is the half-normal distribution. The mean of the proposed distributions was set equal to the last accepted value, and the variance was set after experimentation. The Higham algorithm [11] was implemented to make sure the covariance matrix, C(l l l), positive definite.
With the MAP estimates, sayľ l l, the posterior predictive density function is given in a closed form, that is, a multivariate normal distribution [4], suggesting that the predicted material density at an arbitrary location, d (m) * i , is given as an element of the mean vector, 2) Random length scale parameters: This approach allows some randomness in lsp, accounting for the joint posterior probability (18), P (d * , l l l|d) ∝ P (d, d * |l l l)π(l l l).
Prior information π(l l l) is assumed to be uniform. The proposal distributions for the algorithm were set as normal distributions with the mean equal to the last accepted value and the variance assigned after experimentation. The proposal distributions for the lsp were truncated to be positive numbers. The posterior predictive distribution is given as P (d * |d) = P (d * , l l l|d), ∂l l l (21) and the predicted material density is then given as its marginal, where d * −i is a vector omitting d (m) * i element. The computational cost for this approach becomes expensive compared to the fixed lsp approach, as the predictive posterior probability is not given in a closed form. Due to the computational cost, the total density values of d * is learned through parallel chains, aiming to learn 30 to 40 different density values each.

C. Evaluation of learning efficiency
We evaluate the efficiency of the two methods described above through the following three aspects. First, we quantify the discrepancy between the posterior mean of material density and its true value, using the mean squared error (MSE), Second, we investigate the extent to which the method allows the discontinuity in the predicted material density. Since the GSP tends to smooth out discontinuities and predicts values close each other for those locations within a short distance, we calculate the difference (Df) of density in sequential locations, for the true density, d(s), and predicted ones d * (posterior means of the GSP) by each method. For sample 1, the Df is calculated for the voxels aligned with the y-axis (Df = d m i −d m i+1 ); for sample 2 it was with the z-axis in the ascending ). The methods that can capture such discontinuity should demonstrate a closer Df value to that of the true density.
Third, we compare the posterior variances of the predicted density values for each method, through which we can judge the adaption of the model to the data.

IV. RESULTS
A. Sample 1 Fig. 2 presents contour plots for the x-y surface at depth z = 0.4. Panel (2a) is the true density, and Panel (2b) shows the output from the image inversion,d (m) , the posterior mean (11) of material density for each voxel. The colour gradient suggests lower material density for the region where the void is set (7). There are misestimations (deeper or lighter blue areas) that could be associated with the lack of information while we go deeper in the interior of the material or with the posterior variances which show that the true values have high posterior probability, even though there is short distance from the posterior means. Panels (2c) and (2d) show the predicted material density, d (m) * , by fixed and random lsp approach respectively. The predicted material density is close to 1 almost everywhere in Panel (2c). The quadrilateral shapes appeared are only due to the discrete colour scheme of the plot; the actual numerical difference between these two is subtle. Note that the fixed lsp method has clearly failed to identify the void region. Due to the prior given variance, the GSP has treated the lower density within the void area as a product of noise and then has smoothed over all these "trends", bringing them close to the most common value.
In contrast, the random lsp approach (Panel 2d) captures the void with two contour levels. The approximate material density of the centre circle is 0.2, and that of the outer circle is 0.9, close to 1. Since the density of the outer circle was calculated from a mix of voxels with some or void density, the predicted value reflects such a mixed proportion.
As to the learning efficiency between the two methods, the random lsp method shows a smaller MSE value (Table I). The posterior variances were lower for the fixed lsp (Table  II). For both approaches, the difference (Df) values are closer to the truth (Table III) for the locations where there is no consecutive change in the density values (thus the true Df is zero). However, for the locations that contain a sudden switch in the density zero to one, or the other way around, the Df values tend to be far from the true ones for both the approaches.     The learning results of each GSP, d * , are presented in Panel (5c) for the fixed lsp approach and Panel (5d) for the random lsp approach. The fixed lsp approach (5c) has resembled well the inversion image,d, (5b). The most noticeable difference is the shrunken void regions in (5c) with some separations from one another, for example, location (−5, −7.2, 0.7). This result implies that, for a highly discontinuous case like this, the learning improvement is minuscule when the lsp is fixed. The results of the random lsp approach (Panel 5d) exhibit more divergent from the inversion image (5b). The learning has successfully toned down the red areas at the centres of the darker blue quadrilaterals. However, it has significantly underestimated the void regions compared to the true image (5a), which is due to the information from the neighbouring x-z slice, in which the corresponding coordinates possess nonvoid density.    For the learning efficiency between two approaches, the fixed lsp shows a smaller MSE (Table IV). The difference in MSE between these two approaches is due to the misspecification of the void region. Table V shows the posterior variances for the same arbitrary location. Under the fixed lsp approach, the variance is the same for all the coordinates and at least 100 times greater than for the random lsp. Since the density has remained close to the inversion image by the fix lsp approach, the higher variance works, covering a wider range for possible values. The random lsp approach exhibits such lower posterior variance, showing a better adaption to the function. In terms of Dfs between the two approaches, Table VI indicates a better performance of the random lsp approach that adapts well to the random variation of the material density. The cases that the fixed lsp prevails in Dfs, they are significantly less and probably due to randomness in the definition of the true density function.

V. DISCUSSION
The interior material density often contains discontinuities due to some random voids within the material. We have, for the first time, proposed here a methodological framework and demonstrated how it could estimate the nondiscretised material density structure from SEM images, using the two simulation examples. Although GSPs perform better for smooth functions [4], our comparison between the two different parametrisation approaches has highlighted that the random lsp approach, an extension of the non-nested model [14], can cope better with some discontinuities in the material structure.
As we have shown, the level of discontinuity within the material plays a pivotal role in the success of GSP modelling. Sample 1 has illustrated better results, particularly for the random lsp approach, as the interior material density is continuous everywhere except for the boundary of the dense and the spherical void regions. Contrary, Sample 2 is discontinuous almost everywhere among all these voxels, which is rather an extreme scenario. In normal circumstances, materials exhibit more locally homogeneous structures in the voxel scale, meaning that neighbouring voxels share the same or similar density. Despite such an unusual scenario of Sample 2, our modelling approaches have partially managed to capture the key feature of the material structure. The posterior mean values have stayed more less the same as the inversion with high variance, allowing the values to vary independently from the coordinates. These results are not surprising since the use of GSP with the squared-exponential covariance kernel is usually suggested in Machine Learning for learning smooth continuous functions. Importantly, allowing the randomness to the lsp, the GSP has managed to delineate the interior material density efficiently.
A key aspect of GSP is covariance kernel modelling as it specifies the process. For example, [4] suggests the radial bias function (RBF) kernel, which simply exerts a multiplicative coefficient, namely the amplitude parameter, to our kernel (14) like, as a typical choice and also recommends that the prior variance should be used only ford (m) i . Here, the amplitude parameter A will regulate the variations. However, we have found that, in our case, the amplitude parameter induces possible model misspecification, due to the discontinuity in the samples, and our modelling approach (14) works better.
Further, the posterior probability distribution (20) can be multimodal, suggesting the need for different lsp values for different coordinates rather than globally fixed values. Our results underline the fact that the non-nested model [14] is sufficient in order to capture the key features in the interior material density. However, the random lsp approach asks a computational cost, for which some forms of parallelisation is required. For example, we recommend learning the density values in a group of 30 variables parallel at the most.