Rule-based in-focus region detection
The widefield microscopy datasets of bulk objects, similar to the one we employed here, often contain in-focus and out-of-focus lateral regions in each slice of the focal stack. These regions change from slice to slice as the focal plane goes through the bulk of the specimen. To distinguish the regions of the slice which are in-focus from those that are out-of-focus, we have explored algorithms typically used for focal plane detection in the axial direction. For each subregion (see sliding window scheme in the Methods section) in each slice of the focal stack we have computed a score corresponding to the focal plane detection algorithm (Fig. 1). Specifically, we compared the following 8 algorithms: Brenner, Tenengrad, Laplacian, SMD, Vollath, Std, Variance, Entropy. Figure 1a illustrates the scanning results from one image (slice). To preserve the homogeneity of the original image, this work uses a square perception window (see Methods section). As presented in the left part of the panel, the perception window slid in the same step size in both horizontal and vertical directions. We tested the following windows sizes: 64, 128, and 256. We tested the strides (step sizes) 16, 32, 64, and 128 in this experiment. We obtained a focus-score map for each slice of the focal stack by reassembling them into a stack and maxima Z-projecting them (see Methods).
As presented in the focus-score map in the right part of Fig. 1a, the brightness indicates a high focus score. We noted that the smaller the perception window was, the more detailed the scanning result was. Yet, a small perception window failed to show the low-frequency information (the global features). For example, the first row extracted only contour information, while the other preserved more global information. Conversely, the large window lost the high-frequency signal (details of images) during scanning leading to undesired results. Notably, between the second and third rows, the third row failed to capture the detailed contour information. Therefore, the balance between low- and high-frequency signals, the window size 128 proved a more appropriate choice for both global features and local details. Next, we examined the stride parameters for this optimal perception window size (Fig. 1a, second row). We noted that the smaller the strides correspond to smoother the final focus scores in the map. A smooth focus-score map indicates the structure details of the zebrafish (eye contour, body components). Conversely, bigger stride steps allow for retaining more global information. This prevents the focus-score map scanning from turning into a simple contour detection method. We further noted that all stride parameters contributed valuable detailed information at various levels. Therefore, a better scanning process should contain multiple stride values to preserve both high-frequency information and local details.
To determine the appropriate focus metric for each region, we applied the eight described rule-based algorithms on the widefield microscopy dataset. The best focus metric was expected to distinguish images on varied focal planes continuously in the axial direction. The in-focus region should score the highest value, while the out-of-focus region should rank at the bottom. With this in mind, we have measured the outputs of each algorithm in comparison to the distance from the perfect focus of a region (defocus). Figure 1b illustrates the sensitivity detection to the focal plane changes.
We noted that five (Variance, Vollath, Std, Brenner, and Laplacian) out of the eight algorithms detected the focal plane changes successfully - from slice No. 0 to slice No. 19. Interestingly, the Vollath algorithm recognized the difference between in-focus images and out-focus images. However, it failed to detect the changes continuously in the middle slices of the stacks. In these slices, the amount of in-focus pixels was visually comparable to the amount of out-of-focus pixels (mixed-focus slice). However, the Vollath score varied only slightly since the fifth slice (see Fig. 1b). In Table 1, we evaluated the time consumption for all eight candidates. Compared to the other candidates, the Brenner and Vollath algorithms were much more computationally expensive for the same images (16 minutes/ slice vs. less than 1 minute from other candidates). Nonetheless, the results were marginally better than Laplacian. Therefore, we concluded that the three algorithms - Variance, Std, and Laplacian outperform other algorithms in sensitivity along the axial direction, as well as in processing time.
Table 1
The time consumption comparison between 9 candidates for in-focus pixels segmentation. To obtain measurements, we evaluated the computation time for segmentation on the whole stack (all 20 slices of one example stack).
Candidates | Computation time (s) |
---|
Standard deviation (std) | 203 |
Variance | 199 |
Laplacian | 270 |
Tenengrad | 1212 |
SMD | > 2.4∙104 |
SMD2 | > 2.4∙104 |
Brenner | 1.92∙104 |
Vollath | 1.2∙104 |
DNN | 0.062 |
Unlike focal plane detection in the axial direction, the task of detecting in-focus parts of the specimen requires focus measurement to distinguish both the in-focus status and the contour information in the lateral directions. The first row of Fig. 1c presents the in-focus status of two images. Zooming into the same patch of these images, the pixel intensities indicate varied contour information. The optimal algorithm should preserve the correct image content during the focus status detection. In the second row of Figure. 1c, we calculated the focus score map from the mixed-focus images (middle slice, mixture from both in-focus and out-of-focus pixels) with the three algorithms above. The detected contour information using the Laplacian differs from the other two. To validate the differences, we merged the focus score map with the corresponding microscopy image in the third row of Fig. 1c. The segmented contour from Laplacian appears to show relatively less detail, compared with Std and Variance.
Thresholding the focus score map with the Otsu algorithm 25, we obtained the focus score masks as references for in-focus pixels. This mask preserves the pixels only from the target focal plane and filters out pixels from other focal planes. In Figure. 2a, we compared the segmentations of the three focus algorithms (Laplacian, Variance, and Std) to the manual segmented GT for validation. The Laplacian shows severe inconsistency with the manual GT. Thus, we concluded that it is inferior to the other two in preserving the correct image content. To be noticed, the Variance marks the in-focus pixels in a more conservative way. This yields the loss of specimen information. We assessed the information loss of two algorithms (Variance and Std) on whole stacks in Figure. 2b. Compared to the manual GT, the Variance barely preserved the complete contour information. The Std, however, showed consistency with the manual GT. This makes the Std the focus algorithm of choice satisfying focus sensitivity and the ability to detect the morphology of specimens.
Deep neural network surrogate for the rule-based in-focus region detection
We have shown that the focus score pipeline with the Std algorithm may generate reliable focus score maps (Fig. 3a). Furthermore, accompanied by automated thresholding algorithms, these maps allow to obtain focus masks which are comparable to the manual GT. However, this rule-based pipeline is relatively computationally expensive and requires a long time to process image stacks). To achieve similar results in an end-to-end single-step fashion, we proposed a DNN surrogate model with the U-Net-like structure 17 Illustrated in Fig. 3b. To obtain the U-Net-based surrogate model, we used the binary masks captured from the output of the rule-based focus score algorithms as weak labels 26. This way, the model learns directly the transformation between focus score masks and raw images.
We opted for U-Net architecture as it is commonly used in biomedical image segmentation tasks. U-Net combines the convolutional neural network (CNN) and the Autoencoder (AE) like structures 27. As a representation learning model, the first several convolutional layers of U-Net (the encoder part) enhance channel numbers of the input images and extract the features in the AE structure. The middle convolutional layer (the bottleneck part) encodes the previous features as embedding vectors in the latent space. The last multiple de-convolutional layers (the decoder part) upsamples the embedding vectors back into the original images. Optimizing the loss between reconstructed images and inputs, the encoder and decoder learn jointly the manifold structures 28 of given datasets. In contrast to the traditional AE structures, the U-Net concatenates the up-sampled embedding code with the feature maps from the corresponding layers in the encoder part 29. This operation casts constraints on the outputs and gives the U-Net an advantage in the supervised learning tasks. This model segmented the in-focus pixels directly from the widefield microscopy images.
As illustrated in Fig. 3b, this model contains 7 convolutional layers − 3 for the encoder; 1 for the bottleneck; 3 up-sampling layers for the decoder. The loss function we chose consisted of focal loss, dice loss, and binary cross entropy summed up into a total loss. To evaluate the performance of the model, this work adopted the IoU score 30 as the metric. As the optimizer, we used Adam with a learning rate of 0.001. After 400 epochs of learning with a batch size of 8, the model converged to a stable value both for IoU scores and the loss. The final IoU was 0.98 and the loss was 0.05 (Supplementary Fig. 1). As presented in Table 1, this DNN model speeds up the segmentation process with 0.062 seconds for one stack. Even with the training time of 8.5 minutes in one shot, this solution is still superior to other candidates by accelerating the segmentation to at least ~ 10000 times.
Surrogate model evaluation
The trained U-Net surrogate model can directly predict the focus-score mask with the widefield microscopy images as inputs. This approach significantly simplified the segmentation compared to the previous focus-score pipeline. Figure 4a illustrates part of the segmentation results. Notably, from the completely out-of-focus slice (slice 0) to the completely in-focus slice (slice 19), the DNN model distinguished the in-focus pixels correctly. In slice 0, the model labelled the whole image as out-focus. The results were consistent with the GT mask. As the focal plane changed during optical sectioning, the image contained more in-focus pixels. The predictions of the DNN model stayed reliable. In slice 19, the model labeled correctly the whole target as in-focus. However, the prediction varied from the GT masks in certain cases (slice 9), which highlights opportunities for future improvements of this DNN model in the future.
The focus-score masks obtained from the surrogate model may be employed for in-focus pixel segmentation from the widefield microscopy images. As shown in Figure. 4b, these masks allow retaining only the in-focus part of the image. These, in turn, may be assembled into a 3D model of a specimen. Notably, this is possible by employing images obtained using widefield in vivo microscopy, in which unlike in CLSM, both in-focus and out-focus light contribute to the formation of the image in every image plane.