Resolution enhancement in confocal microscopy images of nano-beads and nucleus. We begin with evaluating the performance of our proposed TCAN model using 23 nm fluorescent beads. The nano-bead samples are imaged on a Leica TCS SP8 STED confocal microscope, and 1000 pairs of confocal-STED image patches with a size of 256×256 pixels are used as training data. Our network takes the confocal image in Fig. 1a as input, which is unseen by the network in the training stage, and outputs a super-resolved image in Fig. 1b. The result of the network is compared with the image (Fig. 1c) acquired using STED microscopy. It can be seen that some of the nano-beads in our samples are too close to be discerned in the raw confocal microscopy image and STED image, while our method is capable of reducing artifacts and blur and resolving these closely spaced nano-beads, as presented in Fig. 1d-f. This is also consistent with the intensity profiles (Fig. 1m) along the white dashed lines in Fig. 1d-f.
We further assess the impact of the proposed TCAN by two image-based criteria: one is image resolution, measured by the full width at half maximum (FWHM) of the PSF, and the other one is image quality, estimated by the signal-to-noise ratio. There are 20 isolated nano-beads selected randomly for the PSF measurement in the images of the confocal microscope and STED microscope, as well as the network output image. The attained FWHM of the confocal microscope PSF is 130 nm, and the PSF distribution of the network output is even slightly better than that of the STED system, with a mean FWHM of 50 nm versus 60 nm, respectively. Since our method also establishes a data-driven image transformation, similar to that discussed in Ref. [9], the learned PSF does not require any prior information on modeling of the image formation process or its parameters.
Next, we verify the practicality of the proposed TCAN by applying it to fixed HeLa cell nucleus. Figure 1g-i displays the input confocal microscopy image, the network output result and the STED image of the same field of view, respectively. We observe that our method succeeds in transforming a low-resolution confocal image into a super-resolution image. As exemplified by the magnified images of the green boxes in Fig. 1j-l, TCAN resolves the densely labeled nuclear pore complexes (NPCs) [19] better than STED image and reduces the background noise, reaching a compromise between retaining useful information and denoising. The rationale behind this result is that the generator in our model benefits from both U-net and DFCAN, which simultaneously learns precise representation of the spatial structures and high-frequency information.
To verify the improvement of our network on image quality, we compare the SNR of the network output to the network input (confocal image), the STED images and the deconvolution of the STED image. SNR is calculated according to the following formula in Ref. [9],
$${\text{SNR}}=\left| {\frac{{s - \bar {b}}}{{{\sigma _b}}}} \right|,$$
3
where is the mean peak value of the signal calculated from a Gaussian fit to the particles, and \(\mathop b\limits^{ - }\) is the mean value of the background (e.g. randomly selected regions which do not contain any objects), and \({\sigma _b}\) is the standard deviation of the background. The results listed in Table 1demonstrate that our proposed method can suppress noise and improve the image quality by different types of samples.
Types
|
Network input (Confocal)
|
Network output
|
STED
|
Table 1
Quantification of SNR improvement
Nano-beads
|
6.0
|
13.2
|
14.1
|
Nucleus
|
3.6
|
12.0
|
8.2
|
Microtubules
|
9.6
|
10.9
|
10.4
|
Resolution enhancement in confocal microscopy images of microtubules. In case the confocal-STED training image pairs are not available, our network model trained with images captured by different imaging modalities is still able to infer super-resolution image. We employ 3000 pairs of wide-field and structured illumination microscopy (SIM) patches with a size of \(256 \times 256\) pixels as training data, and apply the present framework to microtubules, a more complex structure. The results are compared against the STED image and deconvolution of the STED image, and the deconvolution is performed by using Huygens Software. Our TCAN model, as expected, reveals noticeably improved resolution in comparison with the input confocal images (Fig. 2a). It is worth noting that the resolution of the network output images (Fig. 2b) is indeed improved, especially that the regions of dense and complex microtubule structures are better resolved and appear sharper, compared with STED images in Fig. 2c, as exhibited by the magnified results of the green boxes. There are artifacts and noise between adjacent microtubules in the STED microscope images. For the comparison to the deconvolution of the STED images in Fig. 2d, it can be seen that there are obvious broken structures, and the discontinuity is more severe for sparsely distributed microtubules. Here we also employ transfer learning, which uses a learned network trained with nano-beads as the initial model, to speed up the training process for nucleus, microtubules and actin.
To quantitatively evaluate the overall performance of our method, we use three metrics, i.e., SNR, mean square error (MSE), and resolution, to measure the quality of the output super-resolved image. MSE numerically computes the pixel-level data fidelity by calculating the difference between the resulting image and the ground truth. Image resolution is performed by means of decorrelation analysis, which describes the highest frequency from the local maxima of the decorrelation functions instead of the theoretical resolution [20]. These results are illustrated in Fig. 2e-g, where generally larger SNR and smaller resolution and MSE of the network output indicate that the conventional STED images and the deconvolution of the STED image are inferior to our inference images.
Figure 2e-g are plotted in Tukey box-and-whisker format. The box extends from the 25th and 75th percentiles and the line in the middle of the box indicates the median. To define whiskers and outliers, the inter-quartile distance (IQR) is firstly calculated as the difference between the 25th and 75th percentiles. The upper whisker represents the larger value between the largest data point and the 75th percentile plus 1.5 times the IQR; the lower whisker represents the smaller value between the smallest data point and the 25th percentile minus 1.5 times the IQR. Data points larger than the upper whisker or smaller than the lower whisker are identified as outliers, which are displayed as black diamonds.
For deep learning methods, the training data determines what we want the neural network to learn. To achieve the best results, the imaging modality for the training data should in principle be precisely matched to that of the input images. However, we find that the image quality rather than the imaging modality of the training data is a critical factor affecting the image inference performance. This can be observed from Figure S4 in Supporting Information. Even though the input images and STED images are captured with the same imaging platform, the output images of the network trained by using deconvolution of the STED images are worse than the results of the network trained with high-quality SIM images. This is related to the fact that the input and output of the framework share a high degree of mutual information, and the quality of the information in the training examples has an effect on the pixel-to-pixel transformation and the resolution enhancement learned by the network. For the task of translating one possible representation of a scene into another, it is broadly referred to as image-to-image translation problem [21]. They share common process of predicting pixels from pixels, and the network architecture used for our training, i.e., conditional GANs [22] have been proven to be effective in learning such mapping. In this problem the input and output are renderings of the same underlying structures, and the training process can be viewed as utilizing this mutual information between the input and label images to restrict the network output. Accordingly, the network pays attention to the quality of structures in the training examples more than the imaging platform of the training data.
Additionally, if the pixel size is large, one microtubule distributes across fewer pixels; otherwise, more pixels are required to show the same structure. Hence the pixel size is another important parameter affecting the feature representation to be learned by the network and the ability of the network to distinguish adjacent microtubules as separate objects. For instance, direct application of a network that is trained with images with a pixel size of 50 nm would produce acceptable biological structures only when the input images have pixel size of 35 nm-70 nm. Therefore, if the pixel size of the input images and training images are different, we upsample/downsample the input images to match that of the training image pairs. After the upsampling/downsampling, the neural network successfully suppresses the artifacts and further improves the resolution of the confocal microscopy images. In Fig. 3, compared the network output images in the third column to the network output images in the fourth column, it is important to note that the effect of the pixel sizes can be compensated by upsampling/downsampling the input images to match the pixel sizes to that of the training data, thereby improving the quality of the inference images. Since the pixel size of our training data is 50 nm, we upsampe the pixel size of 75 nm of the input confocal images in the first row, while downsample other pixel sizes of the input images in the third row to the seventh row. In addition, compared the network output images in the second column to the network output images in the third column, we notice that the model trained by L1 loss is more robust against the variations of pixel sizes than the model trained by L2 loss, although the latter can obtain better inference images when the pixel size of the input images and the training data is the same (50 nm in our experiments). The result is related to the fact that L2 loss is more sensitive to outliers and gets stuck more easily in a local minimum [23, 24].
This also facilitates the application of the TCAN model to a large field of view of the confocal images. Figure 4 displays the results of applying our method to super-resolve confocal images of , respectively, revealing finer features of the microtubules. The above results demonstrate that the proposed framework is able to achieve favorable performance for various fields of view of input images.
When the input image is captured with a new experimental setup, our TCAN network model does not need to be trained again. We apply the network model trained with wide-field and SIM image pairs to directly super-resolve the images of microtubules captured with the Nikon A1R MP + microscope. The confocal microscopy images are transformed into resolution-enhanced images, as shown in Fig. 5, exhibiting more sharp details of the microtubules. To provide further demonstration of the network’s generalization, two large confocal image patches of , also acquired by the Nikon A1R MP + microscope, are used as input, and Figure s5 in Supporting Information illustrates the advantage of the GAN-based super-resolution approach with upsampling/downsampling. It is possible to extend applications of our TCAN model to super-resolve low-resolution images captured with different imaging systems.
The generalization of our TCAN model includes improving resolution of images acquired with new imaging systems and improving image resolution on new types of samples that are not present in the training phase. As manifested in Fig. 5 and Figure S5, resolution enhancement of confocal images captured with the Nikon the A1R MP + microscope are achieved by our network model trained with wide-field and SIM image pairs. Another example of generation of our approach is supported by Fig. 8, where our TCAN model trained with only images of the microtubules is applied to super-resolve actins. Even though this new type of sample is unseen in the training dataset, our network is capable of inferring correctly their fine structures.
Resolution enhancement in confocal images of live-cell microtubules. To test whether TCAN is competent in live-cell imaging, we study the dynamic changes of microtubules by time-lapse imaging. The dynamic instability of the microtubules is important because of their involvement in delivering information, and it is a fast process demanding high spatiotemporal resolution imaging [25].
In this work, we employ the TCAN model trained with static microtubules images to transform low-resolution confocal images of live-cell microtubules into high-resolution ones. The raw images in both the confocal mode and STED mode are acquired for 10 frames at 45s intervals (Fig. 6a). Figure 6a shows the resolution enhancement and superior image quality when comparing with STED images, and the resolution of our network output images is almost constant within at least 7 minutes (See Visualization 1). Then, the dynamic instability of microtubules is visualized, for example, as marked by arrows in Fig. 6b-e. The dynamic changes can be divided into two kinds, one is changing in the shape of microtubules (Fig. 6b-c), and the other is changing in the length of microtubules (Fig. 6d-e). For the first kind, we capture that microtubule varies distinctly, becoming curved from originally straight. This is consistent with the current model for microtubule assembly and dynamics, which postulates that microtubules grow by attachment of curved guanosine triphosphate (GTP)-tubulins to the ends of curved photofilaments [26]. For the second kind, the plus end of the microtubule grows due to assembly, and the quick transitions between microtubule growth and temporal pause even can be observed at a high temporal resolution in our experiments. The high spatial resolution of our TCAN model ensures the precision of microtubules dynamic characterization and detection of densely packed microtubules undetectable with other methods.
Similar improvement can be obtained when applying our method to super-resolve confocal images of live-cell microtubules acquired with the Nikon A1R MP + microscope (See Visualization 2). We capture raw images for 31 frames at 20s intervals. This result discerns the dynamic changes at microtubule intersections, and we notice that the intersection, indicated by the blue arrows (Fig. 6h-i), gradually becomes separated because of the microtubule shrinkage. For the microtubule seen at the magenta arrow in Fig. 6j-k, it shrinks and the other microtubule grows over time until they are intersected.
The changes of the separation distance of the intersecting microtubules and microtubules shrinkage can also be viewed in Fig. 7, Visualization 3 and Visualization 4. We capture raw images for 61 frames at 20s intervals. As demonstrated in Ref. [27], lysosome transport has a strong correlation with the distance between the intersecting microtubules, and thus it is crucial to visualize the motion of the complex microtubule networks with a high-resolution. Moreover, the unchanged microtubules in the white boxes in Fig. 7 signify that our region of interest is in the focus plane during the observation period, excluding the possibility that the dynamic changes of the microtubules are from defocusing. It should also be noted that the imaging time of live-cell microtubules in Visualization 3 and Visualization 4 is about 20 minutes. Since confocal microscope does not suffer from photobleaching and phototoxicity as severely as the STED microscope, our method is fit for long-term super-resolving confocal images of live-cell.
The above results give prominence to the feasibility and advantage of improving image resolution based on deep learning. In other words, the proposed TCAN model is conducive to resist photobleaching in the traditional STED technique by extending the maximum number of usable consecutive frames of time-lapse images [28].
Resolution enhancement in dual-color confocal images of actin-microtubules. As the components of cytoskeleton, actin-microtubule crosstalk is important for the core biological process [29]. Thus, we simultaneously image actin filaments (cyan) and microtubules (magenta) with the Nikon A1R MP + microscope, and then improve the image resolution by our TCAN model trained with only the microtubules data. Raw confocal images in Fig. 8a, 8c and 8e exhibit spurious small structures outside of the filaments and large fluctuations in fluorescence along the actin filaments. In contrast, TCAN suppresses the artifacts and resolves successfully the densely packed structures of the microtubules and the fine branches of the actin filaments (Fig. 8b, 8d and 8f). The relative positions of the microtubules and filaments can also be observed in the super-resolved dual-color images. Typical means of crosstalk between the microtubules and actin can be found in our network output, for instance, actin-microtubule crosslinking (white box), actin barrier (green box), and mechanical cooperation (Fig. 8f) [29], while they are not clear in the confocal images due to poor resolution.