Robust Denoising of Cryo-EM Images via β -GAN

The cryo-electron microscopy (Cryo-EM) becomes popular for macromolecular structure determination. However, the 2D images which Cryo-EM detects are of high noise and often mixed with multiple heterogeneous conformations and contamination, imposing a challenge for denoising. Traditional image denoising methods and simple Denoising Autoencoder can not remove Cryo-EM image noise well when the signal-noise-ratio (SNR) of images is meager and contamination distribution is complex. Thus it is desired to develop new effective denoising techniques to facilitate further research such as 3D reconstruction, 2D conformation classiﬁcation, and so on. In this paper, we approach the robust denoising problem for Cryo-EM images by introducing a family of Generative Adversarial Networks (GAN), called β -GAN, which is able to achieve robust estimate of certain distributional parameters under Huber contamination model with statistical optimality. To address the challenge of robust denoising where the traditional image generative model might be contaminated by a small portion of unknown outliers, β -GANs are exploited to enhance the robustness of denoising Autoencoder. The method is evaluated by both a simulated dataset on the Thermus aquaticus RNA Polymerase (RNAP) and a real dataset on the Plasmodium falciparum 80S ribosome dataset (EMPIRE-10028), in terms of Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and 3D Reconstruction as well. The results show that equipped with some designs of β -GANs and the robust (cid:96) 1 -Autoencoder, one can stabilize the training of GANs and achieve the state-of-the-art performance of robust denoising with low SNR data and against possible information contamination. Our proposed methodology thus provides an effective tool for robust denoising of Cryo-EM 2D images, which is helpful for 3D structure reconstruction.


Introduction
The cryo-electron microscopy (Cryo-EM) has become one of the most popular techniques to resolve the atomic structure. In the past, Cryo-EM was limited to large complexes or low-resolution models. Recently the development of new detector hardware has dramatically improved the resolution in Cryo-EM 1 , which makes Cryo-EM widely used in a variety of research fields. Different from X-ray crystallography, Cryo-EM has the advantage of preventing the recrystallization of inherent water and re-contamination. Also, Cryo-EM is superior to Nuclear Magnetic Resonance spectroscopy (NMR) in solving macromolecules in the native state. In addition, both X-ray crystallography and NMR require large amounts of relatively pure samples, whereas Cryo-EM requires much fewer samples 2 . For this celebrated development of Cryo-EM for the high-resolution structure determination of biomolecules in solution, the Nobel Prize in Chemistry in 2017 was awarded to three pioneers in this field 3 . However, it is a computational challenge in processing raw Cryo-EM images, due to heterogeneity in molecular conformations and high noise. Macromolecules in natural conditions are usually heterogeneous, i.e., multiple metastable structures may coexist in the experimental samples 4,5 . Such conformational heterogeneity adds extra difficulty to the structural reconstruction as we need to assign each 2D image to not only the correct projection angle but also its corresponding conformation. This imposes a computational challenge that one needs to denoise the Cryo-EM images without losing the key features of their corresponding conformations. Moreover, in the process of generating Cryo-EM images, one needs to provide a view using the electron microscope for samples that are in frozen condition. Thus there are two types of noise: one is from ice, and the other is from the electron microscope. Both of them are significant in contributing high noise in Cryo-EM images and leave a difficulty to the detection of particle structures ( Figure  • In order to better describe the complex generative process in Cryo-EM images, we enhance the traditional image generative model with Huber contamination model, where a small portion of samples allows for unknown contamination. To recover the clean image in this new model, we introduce a family of β -GAN, which is able to achieve the robustness of denoising against partial agnostic contamination of samples (e.g., (.5, .5)-GAN or (1, 1)-GAN in this family work best in this paper, where β -GAN has two parameters: α and β , often written as (α, β )-GAN).
• We exploit a joint training of GANs and denoising Autoencoders toward robust denoising. Both Autoencoder and GANs help each other for Cryo-EM denoising in low signal-noise-ratio scenarios. On the one hand, Autoencoder helps stabilize GANs during training, without which the training processes of GANs are often collapsed due to high noise; on the other hand, GANs help Autoencoder in denoising by sharing information in similar samples via distribution learning and enhancing the robustness against contaminations.
• Numerical experiments and reconstruction are conducted with both simulated dataset on the Thermus aquaticus RNA Polymerase (RNAP) and a real dataset on the Plasmodium falciparum 80S ribosome dataset (EMPIRE-10028). The experiments on those datasets show the validity of the proposed methodology and suggest that: some designs of β -GANs, such as (.5, .5)-GAN and (1, 1)-GAN, joint with robust 1 -Autoencoder are among the best choices in robust denoising against unknown contamination; on the other hand, despite that WGANs achieve superb performance in contamination-free scenarios, they deteriorate significantly under contaminated samples.

Network Architecture and Hyperparameters
In this paper, we exploit a family of (α, β )-GANs jointly trained with p -Autoencoder, shown in Algorithm 1. For (α, β )-GAN, we reports two types of choices: (1) α = 1, β = 1; (2) α = 0.5, β = 0.5 since they show the best results in our experiments, while the others are collected in the supplementary. For WGAN, the gradient penalty with parameter µ = 10 is used to accelerate the speed of convergence and hence the algorithm is denoted as WGANgp below. The trade-off (regularization) parameter of 1 or 2 reconstruction loss is set to be λ = 10 through out this section, while an ablation study on varying λ is also discussed in the supplementary.
For the optimization method, we chose Adam 24 . The learning rate of the discriminator is η d = 0.001, and the learning rate of the generator is η g = 0.01. We choose batch size m = 20, k d = 1, and k g = 2 in Algorithm 1.
For the choice of network architecture, the best results in the experiments of this paper come from the ResNet architecture 21 shown in Figure 2, which has been successfully applied to study biological problems such as predicting protein-RNA binding. The generator in such GANs exploits the Autoencoder network architecture, while the discriminator is a binary classification ResNet. In the supplementary, we also discuss a Convolutional Network without residual blocks and the PGGAN 25 architecture with their experimental results, respectively. Reproducible codes can be downloaded at: https: //github.com/ghl1995/denoise-gan-in-cryo-EM.

Figure 2.
The architectures of (a) discriminator (D) and (b) generator (G) which borrow the residue structure. The input image size (128 × 128) here is adapted to RNAP dataset, while in EMPIRE-10028 dataset it is 256 × 256 with a similar architecture.

Denoising without contamination
In this part, we attempt to denoise the noisy image without the contamination (i.e., ε = 0 in model (2)). In order to present the advantage of GAN, we compare the denoising result in different methods. Table 1 shows the MSE and PSNR of different methods in SNR 0.05 and 0.1. We recognize the traditional methods such as KSVD, BM3D, Non-local mean, and CWF can remove the noise partially and extract the general outline, but they still leave the unclear piece. However, deep learning methods can perform much better. Specifically, we observe that GAN-based methods, especially WGANgp + 1 loss and (.5, .5)-GAN + 1 loss, perform better than denoising Autoencoder methods, which only optimizes 1 or 2 loss ( 1 -Autoencoder represents 1 loss, 2 -Autoencoder represents 2 loss, GAN + 1 represents adding 1 regularization in GAN generator loss). The adversarial process inspires the generation process, and the additional 1 loss optimization speeds up the process of generation towards reference images. Notably, WGANgp and (5, .5)or (1, 1)-GANs are among the best methods, where the best mean performance up to one standard deviation are all marked in bold font. Specifically, compared with (.5, .5)-GAN, the WGANgp get better PSNR and SSIM in SNR 0.1; the (.5, .5)-GAN shows the advantage in PSNR and SSIM in SNR 0.05 while (1, 1)-GAN is competitive within one standard deviation. Also, Figure 3   and (c) separately show the 3D volume recoverd by clean images and denoised images. Also, the related FSC curves are shown in Figure 3(d). Specifically, the blue curve, which represents (.5, .5)-GAN + 1 denoised images is closed to red curves representing the clean images. We use the 0.143 cutoff criterion in literature (the resolution as Fourier shell correlation reaches 0.143, shown by dash lines in Figure 3(d)) to choose the final resolution: 3.39Å. The structure recovered by our method and FSC curve are as good as the original structure, which illustrates that the denoised result of β -GAN can identify the details of image and be helpful in 3D reconstruction.
In addition, in supplementary we show an example that GAN with 1 -Autoencoder helps heterogeneous conformation clustering.

Robustness under contamination
In this part, we consider the contamination model ε = 0 and Q from purely noisy images. We randomly replace partial samples of our training dataset of RNAP by noise to test the robustness of denoising methods under contamination. There are three ways to test: (A) Only replacing the clean reference images. It implies the reference images are wrong or missing, such that we do not have the reference images to compare. This is the worst contamination case. (B) Only replacing the noisy images. It means the Cryo-EM images the machine produces are broken. (C) Replacing both, which indicates both A and B happen. The latter two are mild contamination cases, especially C that replaces both reference and noisy images by Gaussian noise whose 1 or 2 loss is thus well-controlled.
Here we test our robustness of various deep learning based methods using the data of SNR 0.1, and the former three contaminations are applied to randomly replace the samples in the proportion of ε ∈ {0.1, 0.2, 0.3} of the whole dataset. Figure 3(e), (f) and (g) compare the robustness of different methods. In all the cases, some β -GANs ((.5, .5)and (1, 1)-) with 1 -Autoencoder exhibit relatively universal robustness. Particularly, (1) The MSE with 1 loss is less than the MSE with 2 loss, which represents the 1 loss is more robust than 2 as desired. (2) The Autoencoder method in 2 loss and WGANgp show certain robustness in case B and C but are largely influenced by contamination in case A (shown in Figure 3 (e)), indicating the most serious damage arising from type A, merely replacing only the reference image by Gaussian noise. The reason is that the 2 Autoencoder and WGANgp method are confused by the wrong reference images so that they can not learn the mapping from data distribution to reference distribution accurately. (3) In the type C, the standard deviations of the five best models are larger compared the other two types. The contamination of both noisy y and clean x images influence the the stability of model more than the other two types.
Furthermore, we take an example in type A contamination with ε = 0.1 for 3D reconstruction. The 3D reconstruction in denoised images with (.5, .5)-GAN + 1 and l 2 -Autoencoder are shown in Figure 3(h) and (j), and related FSC curve is Figure 3(i). Specifically, on the one hand, the blue FSC curve of 2 -Autoencoder doesn't drop, which leads to the worse reconstruction; on the other hand, the red FSC curve of (.5, .5)-GAN + 1 drops quickly but begins to rise again, whose reason is that some unclear detail of structure mixed angular information in reconstruction. When applying 0.143 cutoff criterion (dashed line in FSC curve), the resolution of (.5, .5)-GAN + 1 is about 4Å. Although reconstruction of images and final resolution is not better than the clean images, it is much clearer than 2 -Autoencoder which totally fails in the contamination case. The outcome of the reconstruction demonstrates that (.5, .5)-GAN + 1 is relatively robust, whose 3D result is consistent with the clean image reconstruction.
In a summary, some (α, β )-GANs ((.5, .5) and (1, 1) here) with 1 Autoencoder are more resistant to sample contamination, which are better to be applied into the Cryo-EM experimental data.

Denoising and Reconstruction Results for EMPIRE-10028
The following Figure 4

Discussion
In this paper, we extend the traditional generative model for Cryo-EM images to a Huber contamination model that includes unknown distributions of contamination. To achieve robust denoising for Cryo-EM images, we propose to exploit β -GANs, a family of Generative Adversarial Netwroks which is able to achieve robust estimate of distributional parameters with statistical optimality, to enhance the robustness of Denoising Autoencoder and have seen that such a joint training scheme can remarkably improve the performance in Cryo-EM image robust denoising. In this joint training scheme, on the one hand, the reconstruction loss of Autoencoder helps GAN to avoid mode collapse and stabilize training; on the other hand, GAN helps Autoencoder in improving robustness of denoising and utilizing the highly correlated Cryo-EM images since they are 2D projections of one or a few 3D molecular conformations. In experiments of both simulated RNAP data and real EMPIRE-10028 data, joint training of 1 -Autoencoder combined with (.5, .5)-GAN, (1, 1)-GAN, and WGAN with gradient penalty is often among the best performance in terms of MSE, PSNR, and SSIM, when the data is contamination-free. However, when an unknown portion of data is contaminated, especially when the reference data is contaminated, WGAN with 1 Autoencoder may suffer from the significant deterioration of reconstruction accuracy. Therefore, some β -GANs (e.g. (.5, .5)-GAN and (1, 1)-GAN) joint with robust 1 -Autoencoder are the overall best choices for robust denoising with contaminated and high 6/13 noise datasets.
There are also some open problems to pursue in future directions. Most of the deep learning-based techniques in image denoising need reference data, limiting themselves in the application of Cryo-EM denoising. For example, in our experimental dataset EMPIRE-10028, the reference data is generated by the cryoSPARC, which itself becomes problematic in highly heterogeneous conformations. Therefore the reference image we learn may follow a fake distribution. How to denoise without the reference image thus becomes a significant problem. It is still open how to adapt to different experiments and those without reference images. On idea possibly overcoming this hurdle is called "image-blind denoising" proposed by 27,28 , in which they viewed the noisy image or void image as the reference image to denoise. Moreover, Chen, J. et al. 29 tried to extract the noise distribution from the noisy image and gain denoised images through removing the noise for noisy data; Quan, Y. et al. 30 augmented the data by Bernoulli sampling and denoise image with dropout. Besides, Bepler, T. et al. 31, 32 applied noise2noise into Cryo-EM image denoising. Nevertheless, all of the methods need noise is independent of the elements themselves. Thus it is hard to remove noise in Cryo-EM because the noise from ice and machine might be correlated to the particles. At last but not the least, for reconstruction problems in Cryo-EM, Zhong, E. D. et al. 33 attempted an end-to-end 3D reconstruction approach based on the network from Cryo-EM images, where they exploited the Variational Autoencoder (VAE) to approximate the forward generative model and recover the 3D structure directly by combining the angle information and image information learned from data. This is one future direction to pursue.

A Generative Model with Huber Contamination
Let x ∈ R d 1 ×d 2 be a clean image, often called reference image in the sequel. The generative model of noisy image y ∈ R d 1 ×d 2 in cryo-EM under the linear, weak phase approximation 12, 34 can be described by where * denotes the convolution operation, a is the point spread function of the microscope convolving with the clean image and ζ is an additive noise, usually assumed to be Gaussian noise that corrupts the image. In order to remove the noise the microscope brings, traditional Denoising Autoenocder such as 35 could be exploited to learn from examples (y i , x i ) i=1,...,n the inverse mapping a −1 from the noisy image y to the clean image x. However, this model is not sufficient in the real case in Cryo-EM. In the experimental data, Cryo-EM images are possibly contaminated by the ice or others, which do not contain any interesting particle information. For example, particles don't exist in all Cryo-EM images, such that even the experimentalists do the manual or automatic particle picking 6 . Such contaminations will significantly affect our denoising efficiency if the denoising methods continuously depend on the sample outliers. Therefore we introduce the following Huber contamination model to extend the image formation model of Equation (1).
Consider that the pair of reference image and experimental image (x, y) is subject to the following mixture distribution P ε : a mixture of true distribution P 0 of probability (1 − ε) and arbitrary contamination distribution Q of probability ε. P 0 is characterized by model (1) and Q accounts for the unknown contamination distribution possibly due to ice, broken of data, and so on such that the image sample does not contain any particle information. This is called the Huber contamination model in statistics 36 . Our purpose is that given n samples (x i , y i ) ∼ P ε (i = 1, . . . , n), possibly contaminated with unknown Q, learn a robust inverse map a −1 (y).

Robust Denoising Method
In this report, we exploit a neural network to approximate the robust inverse mapping G θ : R d 1 ×d 2 → R d 1 ×d 2 , here a neural network parameterized by θ ∈ Θ whose structure will be discussed in the former. Our goal is to ensure that discrepancy between reference image x and reconstructed image x = G θ (y) is small. Such a discrepancy is usually measured by some non-negative loss function: (x, x). Therefore, the denoising problem minimizes the following expected loss, In practice, given a set of training samples S = {(x i , y i ) : i = 1, . . . , n}, we aim to solve the following empirical loss minimization problem, For example, the following choices of loss functions will be considered in this paper: where W 1 is the 1-Wasserstein distance between distributions of x and x; where D is some divergence function to be discussed below between distributions of x and x.
Both the 2 and 1 losses consider the reconstruction error of G θ . The 2 -loss above is equivalent to assume that G θ (y|x) follows a Gaussian distribution N (x, σ 2 I D ), and the 1 -loss instead assumes a Laplacian distribution centered at x. As a result, the 2 -loss pushes the reconstructed image x toward mean by averaging out the details and thus blurs the image. On the other hand, the 1 -loss pushes x toward the coordinate-wise median, keeping the majority of details while ignoring some large deviations, thus increases the contrast of the reconstructed image and becomes more robust than the 2 loss against large outliers. Although 1 -Autoencoder has a more robust loss than 2 , both of them are not sufficient to handle the contamination in (2). To deal with the Huber contamination model (2), β -GAN will be introduced below.

β -GAN
Recently Gao et al. 22,23 showed that some types of GANs might achieve robustness for Huber contamination models, playing a similar role as Tukey's median 37 in terms of statistical optimality. Therefore it is natural to bring such robust GANs into our considerations. In particular, the following so called β -GAN is shown in 23 to achieve statistically optimal robust estimates.
Adapted to the setting in this paper, β -GAN aims to solve the following minimax optimization problem to find the G θ , where 1], and D is another neural network called discriminator whose architecture will be discussed on Results. For simplicity, we denote this family with parameters α, β by (α, β )-GAN in this paper.
The family of (α, β )-GAN includes many popular members. For example, when α = 0, β = 0, it becomes the JS-GAN 17 which aims to solve the following minimax problem whose loss is the Jensen-Shannon divergence, When α = 1, β = 1 the loss is a simple mean square loss; when α = −0.5, β = −0.5, the loss is boost score. In particular, it is shown in 23 that for all |α − β | < 1, (α, β )-GAN family is robust in the sense that one can learn an elliptical distribution P 0 from contaminated distributions P ε under the strong contamination model: More details can be seen in the supplementary. In this report, we are going to see such β -GANs can also help enhance the robustness of Cryo-EM image denoising against contamination. Yet, we note that Wasserstein GAN (WGAN) is not a member of this family. By formally taking S(t, 1) = t and S(t, 0) = −t, we have the following WGAN where an additional gradient penalty is added here (WGANgp) 18,19 .
wherex is uniformly sampled along straight lines connecting pairs of generated and real samples; and µ is a weighting parameter. In WGANgp, the last layer of the sigmoid function in the discriminator network is removed. Thus D's output range is the whole real R, but its gradient is close to 1 to achieve Lipschitz-1 functions. Gradient penalty may help stabilize the training of WGAN. Compared to JS-GAN, WGAN aims to minimize the Wasserstein distance between the sample distribution and the generator distribution. Therefore, WGAN is not robust in the sense of contamination models above as arbitrary ε portion of outliers can be far away from the main distribution P 0 such that the Wasserstein distance is arbitrarily large.

Joint Autoencoder-GAN and Main Algorithm
Cryo-EM images consist of 2D-projections of the same molecular conformation in different viewing angles, and the reconstruction losses ( 1 or 2 ) do not explicitly take into account similar images of similar conformational projections. In addition to possible robustness, GANs can further help denoising by exploiting common information in similar samples during distribution learning; for example, they minimize some divergence or Wasserstein distance between reference image set and denoised image set where similar images can help boost signals for each other.
On the other hand, Autoencoder can help stabilize GANs during training, without which the training processes of GANs are often oscillating and sometimes collapsed due to the presence of high noise (see the supplementary).
For these considerations, in this paper, we propose a combined loss with both GAN and Autoencoder reconstruction loss, where p ∈ {1, 2} and λ ≥ 0 is a trade-off parameter for p reconstruction loss. Algorithm 1 summarizes the procedure of joint training of Autoencoder and GAN, which will be denoted as "GAN+ p " in the experimental section depending on the proper choice of GAN and p.

Evaluation Method
We exploit the following three metrics to determine whether the denoising result is good or not. They are the Mean Square Error (MSE), the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index Measure (SSIM).
(MSE) For images of size d 1 × d 2 , the Mean Square Error (MSE) between reference image x and denoised image x is defined as, The smaller is the MSE, the better the denoising result is.
(PSNR) Similarly, the Peak Signal-to-Noise Ratio (PSNR) between reference image x and denoised image x whose pixel value range is [0,t] (1 by default), is defined by PSNR := 10 log 10 The larger is the PSNR, the better the denoising result is.
(SSIM) The third criterion is the Structural Similarity Index Measure (SSIM) between reference image x and denoised image x is defined in 38 where µ x (µ x ) and σ x (σ x ) are the mean and variance of x ( x), respectively, σ x x is covariance of x and x, c 1 = K 1 L 2 , c 2 = K 2 L 2 , c 3 = c 2 2 three variables to stabilize the division with weak denominator (K 1 = 0.01, K 2 = 0.03 by default), L is the dynamic range of the pixel-value (1 by default). The value SSIM of lies in [0, 1], where the closer it is to 1, the better the result is.
Although these metrics are widely used in image denoising, we note that they might not be the best metrics for Cryo-EM images. For example, it shows an example that the best-reconstructed images perhaps do not meet the best MSE/PSNR/SSIM metrics in the supplementary.

9/13
In addition to these metrics, we consider the 3D reconstruction based on denoised images. Particularly, we take the 3D reconstruction by RELION 39 to validate the denoised result. The procedure of our RELION reconstruction is as follows: firstly creating the 3D initial model, then doing 3D classification, followed by operating 3D auto-refine. Moreover, for heterogeneous conformations in simulation data, we further turn the denoising results into a clustering problem to measure the efficacy of denoising methods, whose details will be discussed in the supplementary.

RNAP: Simulation Dataset
We design a conformational heterogeneous dataset obtained by simulations. We use Thermus aquaticus RNA Polymerase (RNAP) in complex with σ A factor (Taq holoenzyme) for our dataset. RNAP is the enzyme that transcribes RNA from DNA (transcription) in the cell. During the initiation of transcription, the holoenzyme must bind to the DNA, then separate the double-stranded DNA into single-stranded 40 . Taq holoenzyme has a crab-claw like structure, with two flexible domains, the clamp and β pincers. The clamp, especially, has been suggested to play an important role in the initiation, as it has been captured in various conformations by CryoEM during initiation 41 . Thus, we focus on the movement of the clamp in this study. To generate the heterogeneous dataset, we start with two crystal structures of Taq holoenzyme, which vary in their clamp conformation, open (PDB ID: 1L9U 42 ) and closed (PDB ID: 4XLN 43 ) clamp. For the closed clamp structure, we remove the DNA and RNA in the crystal structure, leaving only the RNAP and σ A for our dataset. The Taq holoenzyme has about 370 kDa molecular weight. We then generate the clamp intermediate structures between the open and closed clamp using multiple-basin coarse-grained (CG) molecular dynamic (MD) simulations ( 44,45 ). CG-MD simulations simplify the system such that the atoms in each amino acid are represented by one particle. The structures from CG-MD simulations are refined back to all-atom or atomic structures using PD2 ca2main ( 46 ) and SCRWL4 ( 47 ). Five structures with equally-spaced clamp opening angle are chosen for our heterogeneous dataset (shown in Figure 5). Then, we convert the atomic structures to 128 × 128 × 128 volumes using Xmipp package 48 and generate the 2D projections with an image size of 128 × 128 pixels. We further contaminate those clean images with additive Gaussian noise at different signal noise ratios (SNR): SNR = 0.05. The SNR is defined by "SNR =Var(Signal)/Var(Noise)" in the real space. For simplicity, we did not apply the contrast transfer function (CTF) to the datasets, and all the images are centered. Figure 5 shows the five conformations pictures.
Training data size is 25000 paired images(noisy and reference images), Test data to calculate the MSE, PSNR and SSIM is another 1500 paired images.

EMPIAR-10028: Real Dataset
This is a real-world experimental dataset that was firstly studied in 26 : the Plasmodium falciparum 80S ribosome dataset (EMPIAR-10028). They recover the Cryo-EM structure of the cytoplasmic ribosome from the human malaria parasite, Plasmodium falciparum, in complex with emetine, an anti-protozoan drug, at 3.2Å resolution. Ribosome is the essential enzyme that translates RNA to protein molecules, the second step of central dogma. The inhibition of ribosome activity of Plasmodium falciparum would effectively kill the parasite 26 . We can regard this dataset to have homogeneous property. This dataset contains 105247 noisy particles with an image size of 360 × 360 pixels. In order to decrease the complexity of the computing, we pick up the center square of each image with a size of 256 × 256, since the surrounding area of the image is entirely useless that does not lose information in such a preprocessing. Then the 256 × 256 images are fed as the input of the G θ -network ( Figure  2). Since the GAN-based method needs clean images as reference, we prepare their clean counterparts in the following way: we first use cryoSPARC1.0 49 to build a 3.2A resolution volume and then rotate the 3D-volume by the Euler angles obtained by cryoSPARC to get projected 2D-images. The training data size we pick is 19500, and the test data size is 500.