Wavefront sensing of interference fringe based on generative adversarial network

To increase the measurement accuracy of optical systems, which are implemented in various applications, an improvement of the optical measurement technique is required. This paper proposes an image-to-image wavefront sensing approach using a deep neural network that directly predicts the phase image from the corresponding interference fringe image instead of reconstruction by the Zernike coefficients. The model is based on a conditional generative adversarial network (CGAN). To train the model, we used the formula-based ideal interference fringe images as the inputs of the CGAN, to conditionally predict the corresponding phase images as the output. We numerically investigated the performance by calculating the similarity between the ideal phase image and model output. In addition, with reference to a previous study, it was determined whether the model can extract more features from the interferogram for the prediction of Zernike coefficients. Moreover, an optical simulation software was introduced to provide an increased number of actual interferograms, to verify the proposed method. Based on the results, the proposed system can obtain the phase image directly and reduce the error, thus improving the measurement accuracy of the interference fringe.


Introduction
In designing and analyzing the optical system quality, aberrations are one of the most critical evaluation indicators, as they are directly related to the imaging quality of the entire optical system, including the spherical aberrations, coma, and other deformation aberrations that blur images. Therefore, the measurement of aberrations is of great significance (Fischer et al. 2008).
In the optical measurement technique, the desired physical quantities, such as displacement and deformation, are expressed in the wavefront form in phases (Hariharan 2010). Traditionally, there are two main methods to measure the wavefront: wavefront sensing and interferometry (Nikitin et al. 2015). Direct wavefront sensing involves a wavefront sensor such as Shack-Hartman sensor, which is used to calculate the offset of the light spot after focusing by a microlens array, to reconstruct the wavefront and aberrations (Goodwin and Wyant 2006). Interferometry is a phenomenon caused by the superposition of the reference light and testing light. Interference fringes can be recorded by a detector, and all deviations between the two beams are reflected by the density and shape of the fringes. Therefore, the wavefront and aberrations can be determined according to the fringe form (Goodwin and Wyant 2006).
Recently, deep neural networks have been implemented in various applications for optical problems (Barbastathis et al. 2019;Campbell et al. 2019). Complex functions can be learned without defining specific physical equations and rules by using deep neural networks for image recognition (LeCun et al. 2015). In previous studies, robust deep neural networks were employed for image recognition to predict aberrations. Several researchers obtained results from focused light spot arrays measured by Shack-Hartmann sensors (Guo et al. 2006;Hu et al. 2019). In several studies, the distributions of point spread functions (PSFs) were used to predict the magnitude of aberrations (Saha et al. 2020;Nishizaki et al. 2019;Jin et al. 2018Jin et al. , 2020Paine and Fienup 2018), and have been extensively applied. In a previous study, aberration measurements were involved in the investigation of interference fringes (Whang et al. 2020). In these examples, deep neural networks were used to mainly predict the Zernike coefficients from the PSF, interference fringes, or other image characteristics. However, the predicted Zernike coefficients are required for the reconstruction of the wavefront or phase image. In addition, in a previous study, the phase images were estimated using a perceptron by minimizing the mean-square error (MSE). However, a minor drawback of this method is that blurriness may occur in the output of the model. This is because the MSE minimizes the error between images, which is equivalent to averaging, and outputs blurry images (Lotter et al. 2015;Ledig et al. 2017).
This study is an extension of a previous study (Whang et al. 2020), wherein a convolutional neural network (CNN) was used to perform pixel-wise forecasting, to predict the Zernike coefficients from interference fringe images or phase images. In particular, this method can reduce the mathematical operations in the process of acquiring Zernike coefficients from interference images. However, the model predicted that the Zernike coefficients from the interferogram model were not as robust as those of the phase model. The error of the interferogram model was higher than that of the phase model. Therefore, this paper proposes a well-trained deep neural network to generate phase images from interference images. These phase images are then be inputted to the phase model to predict the Zernike coefficients. It can reduce the overall error from interference images to the coefficients, and the entire prediction process changes to an artificial intelligence (AI-based) method. Therefore, this paper proposes a novel wavefront reconstruction approach using a deep neural network based on the conditional generative adversarial network (CGAN) (Mirza and Osindero 2014), to reconstruct the wavefront directly from the interference fringe image. We numerically evaluated the reconstructed wavefront from the CGAN by calculating the similarity between the actual and synthetic wavefront phase images, and the phase value. Based on this method, the wavefront can be generated without calculating the Zernike coefficients. We then inputted these phase images to the phase model developed in a previous study (Whang et al. 2020) to predict the Zernike coefficients. We used the GAN model with the phase model to directly predict the Zernike coefficients from interference fringe images. The prediction accuracy was higher than that of the interferogram model in the previous study.
Furthermore, we used an optical simulation software (VirtualLab Fusion) to verify the proposed method. The software simulated a small dataset to fine-tune the CGAN model, and demonstrated a lower error than that in the previous study before and after fine-tuning the model. The CGAN was found to be more adaptable to actual optical problems.

Optical detail
Interference is a phenomenon in which two waves are superimposed to form a resultant wave based on the phase difference. In general, the reference and test wavefronts in an interferometer are expressed by Eq. (1) and Eq. (2): where A r (x, y) and A t (x, y) are the wavefront amplitudes of the reference and test beams, respectively; φ r and φ t are the wavefront phases; and δ(t) is a time-varying phase shift introduced into the reference beam. Typically, the interference fringe pattern can be expressed as Eq. (3).
where I′ = A r 2 (x, y) + A t 2 (x, y) is the average intensity, and I′ = 2A r (x, y) at (x, y) is the fringe modulation. When the reference and test light amplitudes are equal, we can simplify Eq. (3) to Eq. (4) as follows: The phase information can be simplified as φ 0 = φ t (x, y) − φ r (x, y). Generally, the measurement of an optical system uses the phase-shift method. The design of the interferometer enables a series of different interference fringe images with phase differences. Therefore, the N-step phase-shift method captures N interference fringe images with varying phase shifts. These interference fringe images are then used to obtain the phase distribution (Hariharan 2010). For example, the phase shift δ(t) can be substituted into Eq. (3) with four (1) W r (x, y, t) = A r (x, y)e i[φ r (x,y)−δ (t) ] (2) W t (x, y, t) = A t (x, y)e iφ t (x,y) discrete spacing pi/2 phase-difference values. Thus, Eq. (3) can be described as four interference intensity patterns, as expressed by Eqs. (5)-(8): Based on the properties of trigonometric identities, we can obtain the unknown phase from the four-step phase-shift method, as expressed by Eq. (9): Based on this method, more phase shifts can be used to achieve a higher accuracy. The phase values in Eq. (9) are wrapped in a particular range; thus, conversion into an unwrapped phase value (Ghiglia and Pritt 1998) and expression by a specific aberration function are required. The aberration function can be expressed by mathematical polynomials such as Seidel polynomials and Zernike polynomials (Wyant and Creath 1992). In this study, Zernike polynomials were used for the mathematical expression of the wavefront. Optical imaging systems generally have an axis of rotational symmetry, and their pupils are generally circular or annular. Zernike polynomials are orthogonal and correspond to optical aberrations (Lakshminarayanan and Fleck 2011). Polynomial items do not influence other polynomial items due to the orthogonality in wavefront fitting. The wavefront formula is represented by Zernike polynomials, as expressed by Eq. (10).
where a n is the Zernike coefficient, Z n is the nth term in the polynomials, ρ is the radial distance, and θ is the azimuth angle range from 0-2π. According to the definition of the Zernike polynomial, each term represents the corresponding optical aberration. During the training process, the model attempts to detect the same trend of behavior as that of the mathematical deduction process described above.

Proposed deep neural network for wavefront sensing
In several previous studies, a CNN was used to predict the Zernike coefficients. However, substitution of the Zernike coefficients into the polynomials was required to reconstruct the wavefront. In this study, a popular image-to-image conversion model (Isola et al. 2017) was trained to directly convert the interference fringe image to its corresponding phase image. The proposed approach uses a conditional generative adversarial network (CGAN) (Mirza and Osindero 2014) model to estimate the distribution of phase images. The model was trained with images, including the interference fringe image and the corresponding phase image, as shown in Fig. 1. The generator was used to retrieve the phase image from the interference fringe image.
A CNN is effective for image recognition. Numerous image recognition based models were extended based on the CNN architecture, similar to the CGAN, and can be used in different optical applications (Li et al. 2020;Huang et al. 2019;Sargent et al. 2020;Moon et al. 2020). The proposed CGAN consists of two networks: a generator based on U-net and a discriminator. As shown Fig. 1a, the generator is trained to generate phase images that cannot be distinguished from the actual images by a trained discriminator. As shown in Fig. 1b, the discriminator is then trained to classify the generator output image and the actual image. In particular, the generator produces a phase image corresponding to the interference fringe image. At the same time, the discriminator determines whether the input image is the generator output or the actual phase image. Through the adversarial training between the generator and discriminator, the model converges to appropriately synthesize the image. The objective of the training process is to balance the model state. Thereafter, the generator can produce images that are highly similar to the actual image.

Datasets
In this study, random amplitudes of the first 32 Zernike polynomials were used to generate the corresponding phase images based on Eq. (10). These phase images were then used to generate the related interference fringe images according to Eq. (4). The randomly generated coefficients ranged from − 0.5-0.5λ for each term of the Zernike polynomial. It should be noted that the piston term in the Zernike mode was set as zero, as it does not influence the wavefront distribution. The CGAN model was trained using two interferograms and one phase image, in which the resolution was 256 × 256 pixels. Moreover, 1000 pairs were used for the training data and 100 pairs for the validation data. A phase image with a corresponding interference fringe image and its Zernike coefficient line chart are shown in Fig. 2.
In image-based wavefront sensing, given that there is a non-unique mapping problem when converting PSFs or interference fringes to a phase image, non-convergence occurs when training a neural network, in addition to inaccurate predictions. Thus, a phase The interference fringe image is inputted to the generator based on U-Net, to accurately generate the corresponding phase image. b A discriminator receives phase images to distinguish whether the phase image is from the generator (artificial image) or is an actual image. The discriminator outputs a probability value that indicates whether the image is "real" or "artificial" diversity should be introduced to prevent non-uniqueness, and deep neural networks to deduce the phase images or aberration functions (Kendrick et al. 1994;Dean and Bowers 2003). The proposed method is therefore based on the concept of phase shift interference. We added π/4 in a phase to generate a new interferogram and then divided the two interferograms, namely, the original and the phase shift, to train the CGAN model. Therefore, we could converge the CGAN model and map the corresponding phase images uniquely and accurately.

Architecture
This paper proposes the use of the CGAN in image-to-image conversion. The CGAN can directly define the phase image from the target interferogram instead of based on calculations or reconstruction after predicting the Zernike coefficients. The proposed model is based on the concept of an image-to-image CGAN (Isola et al. 2017), which includes a generator that uses U-net for image generation and a discriminator such as PatchGAN. The generator is trained to generate the phase images corresponding to the interference fringe images, and its objective is to render the real and artificial images as indistinguishable by the discriminator. The discriminator is trained to judge whether the phase images are synthesized from the generator or actual data. Although the final objective is for the generator to conduct image-to-image conversion, the discriminator facilitates the training of the generator. The entire training process is balanced when the generator output is highly similar to the actual phase image.
In this study, the image size of the generator input and output was 256 × 256 pixels to perform image-to-image conversion, as shown in Fig. 3a. According to the U-net architecture, the generator consisted of six down-sampling blocks and six up-sampling blocks. The down-sampling blocks performed feature extraction from the interference fringe image. Given that the divided interferogram was used as the input of the CGAN model, it was distributed over a broad data range. Therefore, we used the hyperbolic tangent (tanh) as the activation function for down-sampling blocks, which can normalize the input data in a specific field, facilitate model convergence, and suppress gradient explosion. Up-sampling blocks generate the phase images according to feature extraction by downsampling. Given that the phase images contained positive and negative values, the phase range varied with the input interferograms. Therefore, we used the parametric rectified linear unit (PReLU) as the activation function for the up-sampling blocks, to allow for the neural network to adaptively adjust the negative slope of the ReLU function based on the input data. Moreover, a deep structure may lose the high-frequency features during downsampling and result in a blurry output image. Thus, a skip connection was used to share the high-frequency features between the input and output.
In contrast, the discriminator received an input image with dimensions of 256 × 256 × 2, and the two channels included the actual and generated phase images, as shown in Fig. 3b. The image passed four convolutional layers with spectral normalization, which improved the stability of network training (Miyato et al. 1802). Finally, the discriminator derived a pixel patch with dimensions of 32 × 32, and distinguished between the real and artificial images. The use of PatchGAN (Isola et al. 2017) reduced the number of parameters to prevent the model from overfitting, increase the speed of the training process, and obtain more accurate training results.
In addition, we used instance normalization for the generator and discriminator instead of batch normalization, to obtain more accurate results. In the process of instance normalization, the mean and variance are calculated for each channel per sample across both spatial dimensions (Ulyanov et al. 1607). It is commonly used in style transfer-type models. Furthermore, we introduced a squeeze-and-excitation (SE) layer ) after each convolution layer, for the generator and discriminator.

Results
Hyperparameters have a significant influence on the CGAN performance (Heusel et al. 1706). The initial negative slope of the PReLU was set as 0.2, and the Glorot normal initializer initialized the convolution kernels to accelerate the training process. After conducting trials, we implemented a two-phase training strategy. In Phase 1, the model was trained at a learning rate of 0.0002 with a momentum of 0.5 for 500 epochs. In Phase 2, the model was recompiled, and the learning rate was decreased to 0.00002 for an additional 100 epochs, to achieve convergence. Generally, an Adam optimizer is used for most CGAN models. However, the stochastic gradient descent with momentum (SGDM) optimizer stabilizes the model converge more effectively and demonstrates a higher performance on the testing datasets (Luo et al. 1902). Therefore, we used the SGDM as the optimizer. The batch size was set as 1 to meet the instance normalization requirements. The training was implemented in Keras 2.4.3 with a TensorFlow backend based on Python 3.7.9, and run on a computer with an NVIDIA RTX 3080 graphics processing unit (GPU) and 10 GB VRAM.
A set of comparison results is presented in Fig. 4. Phase images of the real and CGAN outputs were difficult to distinguish. In addition, the residual images display the observed differences.

Evaluation
Evaluating the quality of the synthesized images is challenging (Salimans et al. 1606). Therefore, the generated phase images were analyzed using two different quantitative methods. First, we calculated the mean absolute error (MAE) between the phase values of the proposed method and the ground truth. The histogram of the MAE for 1000 randomly generated phase images is shown in Fig. 5a, and the average MAE of the 1000 phase images was 0.211 ± 0.107 λ. In addition, the structural similarity index (SSIM) was analyzed (Wang et al. 2004). We calculated the SSIM between the proposed method and the ground truth to evaluate the performance of the CGAN. The SSIM can be used to numerically assess the differences between two images, and is more correlated with human perception. Figure 5b presents an SSIM histogram for 1000 pairs of phase images with random Zernike coefficients, and the average SSIM of the 1000 phase images was 0.993 ± 0.004.
Second, we evaluated the generated phase images more accurately, to determine whether they were in good agreement with the ground truth of the Zernike coefficients. We used a pre-trained neural network as a criterion for the proposed method (Shmelkov et al. 2018) and the pre-trained neural network used in a previous study (Whang et al. 2020). In the previous study, the phase and interferogram models were based on GoogleNetV3. The first model was trained to predict the Zernike coefficients from a phase image, and the second model was trained to predict the coefficients from an interferogram. Therefore, we inputted the generated phase images from the CGAN into the GoogleNetV3-Phase model to predict the Zernike coefficients and compare them with the ground truth. Figure 6a presents one of the generated phase images from the CGAN and the coefficient line chart predicted by the GoogleNetV3-Phase model. The lines of the ground truth and the predicted value were highly overlapped; thus, the generated phase image accurately corresponded to the Zernike coefficients. The histogram of 1000 root-mean-square error (RMSE) sets of Zernike coefficients is shown in Fig. 6b.
The RMSE of the Zernike coefficients from the phase model collaborate with the CGAN, and the interferogram model is shown in Table 1. We tested the same 1000 sets of test data, and the CGAN achieved a lower estimation error in the test set. The prediction error from an interferogram pattern was greater than that of the results by a factor of four. Given that the CGAN was more efficient in feature extraction; the performance of the twostep method (GAN + CNN) was superior to that of the one-step method (CNN).

Experiment
In addition, to validate the proposed method, we established an interferometer architecture using the VirtualLab Fusion software. The software was used to simulate the interference fringe images from the corresponding Zernike coefficients, to validate the proposed model. Fig. 6 Results of the CGAN model: a a generated phase image using CGAN and the predicted Zernike coefficients by the GoogleNetV3-Phase model, with the predicted result compared with the ground truth; and b the RMSE of 1000 phase images The interferometer is based on the architecture of the Fizeau interferometer, as shown in Fig. 7. Compared with the ideal interference fringe images used for training and testing, the edges of the fringes were blurry and jagged, and the contrast was lower. We moved the optical component to simulate the interference fringe pattern with a phase shift, and then divided the two images to obtain the input of the CGAN model for the prediction of the phase images. Figure 8a presents three simulated interferograms from VirtualLab Fusion, the predicted results from the CGAN, and the residual phase images. The residual image quality was lower than that in the ideal case. In the line chart of the Zernike coefficients, several predicted values exhibited considerable errors, as shown in Fig. 8b.
First, the ten interference images were converted into phase images by the CGAN, and the average MAE between the phase value obtained by the proposed method and the ideal phase image was approximately 0.363 ± 0.038 λ. The SSIM of the phase image between the proposed method and the ideal method was approximately 0.851 ± 0.047. Second, these generated phase images were inputted to the GoogleNetV3-phase model to predict the Zernike coefficients, and the average RMSE was approximately 0.091 ± 0.017 λ. The RMSE of the Zernike coefficients from the two-step and one-step methods is shown in Table 2. We tested the same 10 experimental data, and the error increased due to the different conditions of the interferogram. The RMSE of both methods increased; however, the CGAN with the GoogleNetV3-Phase model demonstrated a lower RMSE.

Transfer learning
The application of specific domain data to a pre-trained model and subsequent fine-tuning is referred to as transfer learning (Pan and Yang 2009), which is a common approach in machine learning, and involves decreasing the demand for labeled data and adaptation to different conditions. From a comparison of an interferometer, optical software, and formula-based imaging, the latter can rapidly generate large images with a lower quality. Therefore, the strategy uses large formula-based photos to first train the model, and then conducts fine-tuning using the images from an interferometer or software. We loaded the CGAN with the weights trained on ideal interference fringe images, and then used Virtu-alLab Fusion to generate 100 interference fringe images to fine-tune the model, thus allowing for the CGAN model synthesis phase images to approximate the real image. Fig. 7 The architecture of the Fizeau interferometer used for the simulation of the interference fringe images Figure 9a presents the outputs of the CGAN after transfer learning. The phase error was reduced significantly with respect to the residual images after transfer learning. The line chart of the Zernike coefficients was closer to the ground truth and the CGAN output, as shown in Fig. 9b. In addition, transfer learning reduced the average MAE from 0.363 λ to 0.152 λ, and increased the average SSIM of the phase images from 0.851 to 0.957. Furthermore, we inputted the predicted phase images to the GoogleNetV3-Phase model to calculate the Zernike coefficients and RMSE. The average RMSE was reduced from 0.091 λ to 0.035 λ. The CGAN adapted the VirtualLab interference fringe image condition by transfer learning, and generated phase images similar to the real image. Fig. 8 Comparison of wavefront detection results on three types of interferogram: a VirtualLab Fusion, the interferogram corresponding to the real phase image, and the predicted phase image from the CGAN. The residual image reveals the differences between real image and CGAN output. b Generated phase images inputted to the GoogleNetV3-Phase model to predict the Zernike coefficients and compared with the ground truth, as indicated by the line chart We conducted transfer learning on the GoogleNetV3-Interferogram model, and predicted the Zernike coefficients from the interferogram for comparison. We used the same 100 interferograms as used in the CGAN for transfer learning. A comparison between the CGAN with the phase model and interferogram model is shown in Fig. 10. The average RMSE of the interferogram model decreased from 0.095 λ to 0.078 λ, and that of the CGAN decreased from 0.091 λ to 0.035 λ. The improvement of the GoogleNetV3 model when compared with that of the CGAN was low, given that the CGAN model had a well-trained discriminator to facilitate model training. Therefore, the CGAN can rapidly and accurately converge the model with less training data to predict the Zernike coefficients and achieve a lower error.

Discussion and conclusions
This paper proposes a model based on a deep learning structure to predict phase images without reconstruction based on Zernike coefficients. To achieve this, a CGAN was trained with a pair of interferograms to predict phase images. First, we calculated the Fig. 9 The results after transfer learning: a VirtualLab Fusion result, corresponding real phase image, and predicted phase image from CGAN after transfer learning. The residual image revealed that the differences between the real image and CGAN output were decreased after transfer learning. b Generated phase images inputted to the GoogleNetV3-Phase model to predict the coefficients and compared with the ground truth, as indicated by the line chart MAE and SSIM of the phase images between the generated and ground truth as indicators of the proposed method. Second, we used the GoogleNetV3 model, the phase model, in a previous study to predict the Zernike coefficients from the generated phase images. Therefore, the phase images generated by CGAN were inputted to Google-NetV3 to predict the coefficients, and the RMSE values were then calculated. Based on the results, ideal and experimental RMSE values of 0.013 λ and 0.091 λ, respectively, were obtained.
We compared the proposed method with the method proposed in the previous study, namely, the interferogram model, with respect to the prediction of the Zernike coefficients. The results revealed that the combination of the CGAN and GoogleNetV3 can achieve a lower RMSE in the measurement of the Zernike coefficients from the interferogram. However, the time required for prediction increased. The average time consumption for the prediction of the interferogram by the CGAN + GoogleNetV3-Phase model was 80.1 ms, which was greater than that of the GoogleNetV3-Interferogram model by a factor of eight.
Moreover, we applied transfer learning to the CGAN and GoogleNetV3-Intergerogram models. Using a well-trained discriminator in the CGAN, the results revealed that the CGAN can demonstrate a decrease in the RMSE from 0.091 λ to 0.035 λ when compared with GoogleNetV3-Interferogram model, with less training data. This indicates that the CGAN can adapt more rapidly to similar problems, and it is essential to deploy the artificial intelligence (AI)-based model in the interferometer. Given that only limited data can be obtained from interferometers for transfer learning, the proposed method may increase the measurement accuracy. The proposed method can therefore be implemented in actual interferometers for increased prediction accuracy. In addition to using GAN to predict the phase map of the interferogram, we can also use GAN to predict the aberration coefficient of the interferogram (Whang et al. 2021), or use RNN with phase shift to predict the aberration coefficient of the interferogram.
Acknowledgements Appreciate the support from Taiwan's Ministry of Science and Technology and the Taiwan Instrument Research Institute.

Fig. 10
The results of the CGAN (with the phase model) and GoogleNetV3 (the interferogram model) before and after finetuning