*n*-dimensional derivation of Radial Symmetry localization

The goal of RS is to accurately localize a bright, circular spot with subpixel accuracy. In noise-free data, image gradients at locations point towards the center of the spot and intersect in that single point ** **(**Fig. 1a**), thus computing the intersection point solves the problem of accurate localization. In realistic images that contain noise, these gradients do not intersect, therefore computing constitutes an optimization problem that RS solved using least-squares minimization of the distances between the common intersection point and all gradients (**Supplementary Figure 1**).

We extend RS to 3D similar to Liu et al. 20 and additionally describe how to generalize the derivation to the n-dimensional case. To achieve this, we replace the Roberts cross operator with separable convolution for image gradient computation, and we use vector algebra to compute the intersection point of image gradients. The derivations are shown in detail in **Supplementary Figure 1** and **Supplemental Material**.

## Radial Symmetry for axis-aligned ellipsoid (non-radial) objects

Diffraction-limited spots in 3D microscopy images are usually not round but show a scaling in the axial (z) dimension compared to the lateral (xy) dimensions. Previous solutions suggested scaling the image in order to be able to detect spots using RS 20. This can be impractical for large datasets, and it might affect localization quality as the image intensities need to be interpolated for scaling. Here, we extend the RS derivation to directly compute the intersection point from anisotropic images by applying a scale vector *s* to point locations and applying the inverse scale vector *s*-1 to the image gradients . While we derive the case specifically for 3D, it can be straight-forward applied to higher dimensions. The derivation is shown in detail in the **Supplemental Material**.

RS-FISH supports a global scale factor (called anisotropy factor) for the entire dataset that compensates for anisotropy of the axial (z) dimension, which can be computed from an image containing diffraction-limited spots (**Supplement**).

## Radial Symmetry RANSAC

RS localization is implemented as a fast, closed-form solution, and it is therefore feasible to combine it with robust outlier removal. We use Random Sample Consensus (RANSAC)21 in order to identify the maximal number of gradients ** **that support the same center point ** **given a maximal distance error ** **so that all** **.

To achieve this, RANSAC randomly chooses the minimal number of gradients (i.e., two gradients) from the set of all gradients (candidate gradients) to compute the center point and tests how many other gradients fall within the defined error threshold . This process is repeated until the maximal set of gradients is identified (inlier gradients) and the final center point ** **is computed using all inlier gradients. This allows RS-FISH to exclude artifact pixels and to differentiate close-by spots.

To identify and locate close-by points, RS-FISH runs a multi-consensus RANSAC. Here, RANSAC is run multiple times on the same set of candidate gradients. After each successful run that identifies a set of inliers, these inliers are removed from the set of candidate gradients, and RANSAC tries to identify another set of inliers (**Fig. 1b**). This process is iterated until no other set of inliers (corresponding to a FISH spot) can be found in the local neighborhood of each DoG spot. To not detect random noise, the minimal number of inliers required for a spot can be adjusted (typically around 30).

## Implementation details and limits

RS-FISH is implemented in Java using ImgLib2, the *mpicbg* framework, BigDataViewer, Fiji, and Apache Spark. The computation of RS is performed in blocks with a size of \({b}_{d}\) for each dimension \(d\) (e.g., 256x256x128 pixels) and requires an overlap of only one pixel in each dimension with neighboring blocks, thus the overhead \(o=1-\frac{\prod _{d}{b}_{d}}{\prod _{d}{b}_{d}-2}\) is minimal (e.g. 1.5% for 256x256x256 blocks or 0.6% for 1024x1024x1024 blocks). When processing each block the local process has access to the entire input image which is either held in memory when running within Fiji or is lazy-loaded from blocked N5 datasets when running on large volumes using Apache Spark. Since the computation across blocks is embarrassingly parallel, computation time linearly scales with the dataset size. Thus, RS-FISH will run on very large volumes supported by N5 and ImgLib2. Due to current limitations in Java arrays the theoretical upper limit is \({2}^{31}=\text{2.147.483.648}\)blocks, each block maximally containing \({2}^{31}=\text{2.147.483.648}\) pixels (e.g. 2048x2048x512 pixels). Given sufficient storage and compute resources, the limit for RS-FISH is thus 4072 peta-pixels (4072 petabyte @ 8bit or 8144 petabyte @ 16bit) taking into account the overhead, whereas every individual block locally only processes 2 gigapixels (\({2}^{31}=\text{2.147.483.648}\) pixels).

The code can be executed on an entire image as a single block for smaller images, or in many blocks multi-threaded or distributed using Apache Spark. It is important to note that RS-RANSAC uses random numbers to determine the final localization of each spot. We use fixed seeds for initializing each block, therefore the results for a single block of the same size in the same image with the same parameters are constant. However, if one compares results of blocks of different sizes (e.g. single-threaded vs. multi-threaded), the results will be slightly different as the RANSAC-based localizations are not traversing the DoG maxima in the same order, thus initializing RANSAC differently. For practicality, the interactive Fiji mode only runs in single-threaded mode (although the DoG image is computed multi-threaded) to yield comparable results across different testing trials. Importantly, this only applies if RANSAC mode is used for localization. Multi-threaded processing is available in the recordable advanced mode in the Fiji plugin, while the Apache Spark based distribution can be called from the corresponding RS-FISH-Spark repository.

## Data Simulation for assessing localization performance

In order to create ground-truth datasets for assessing localization performance, we generated images simulating diffraction-limited spots in the following way: \((x,y,z)\) spot positions were randomly assigned within the z-stack chosen dimensions, and each spot was assigned a brightness picked from a normal distribution. We computed the intensity \(I(x,y,z)\) generated by each spot as follows: we first computed the predicted average number of photons received by each pixel \({I}_{pred}(x,y,z)\) computed using a gaussian distribution centered on the spot, with user-defined lateral and axial extensions. We then simulated the actual intensity collected at each pixel using a Poisson-distributed value with mean \({I}_{pred}(x,y,z)\). We eventually added gaussian-distributed noise to each pixel of the image.

Code for generating the images with simulated diffraction-limited spots is available in the GitHub repository. There is also a folder included with the simulated data used in the parameter grid search and benchmarking.

https://github.com/PreibischLab/RS-FISH/tree/master/documents/Simulation_of_data

## Benchmarking RS-FISH against commonly used spot detection tools

RS-FISH performance was benchmarked against the leading tools for single-molecule spot detection in images. The tools compared in the benchmarking are FISH-quant14 (Matlab), Big-FISH 26 (Python), AIRLOCALIZE 17 (Matlab), Starfish16 (Python), and deepBlink15 (Python, TensorFlow). Localization performance comparison was done on simulated images with known ground truth spot locations, and computation time comparison was performed using real three-dimensional *C. elegans* smFISH images. We created a dedicated analysis pipeline for each tool to test localization performance and compute time. For localization performance comparison, a grid search over each tool’s pipeline parameter space was run (excluding deepBlink, as a pre-trained artificial neural network was used, more details regarding deepBlink are discussed in **Supplemental Notes**). Importantly, tools use different offsets for their pixel coordinates, which depends on the respective pixel origin convention (e.g. does a pixel lie at \(\text{0,0}\) or \(\text{0.5,0.5}\), does the z index start with 0 or 1). In our benchmarks we detect these offsets by computing the precision (the average, signed per-dimension difference between predicted and ground-truth spots) and corrected for these offsets if necessary. Notably, the Matlab tools AIRLOCALIZE and FISH-quant both show a \(0.5\) pixel offset in \(xy\), and \(1\) pixel in \(z\) compared to the other tools. For computation time comparison, each pipeline’s parameters were selected to produce a similar number of detected spots for each image.

The comparison shows that RS-FISH is on par with currently available spot detection tools in localization performance (**Fig. 2a-c, Supplement**) while running 3.8x – 7.1x times faster (**Fig. 2d, Supplement**). Additionally, RS-FISH is currently the only available tool that can be directly applied to large images, which we highlight using a 148 GB lightsheet image stack13 (**Fig. 1h, Supplementary Video 1, Supplement)**. The image size of the lightsheet stack is 7190x7144x1550 pixels, and the block size used for detection was 256x256x128 pixels. The detection of spots using RS-FISH took 3263 sec (~32 CPU hours) for the entire image on a 36 CPU workstation with 2x Intel Xeon Gold 5220 [email protected] The runtime cannot be directly compared to the custom extension of AIRLOCALIZE that was developed for the same project as it is written to specifically run only on the Janelia cluster. The compute time of 156 CPU hours was extracted from the cluster logs of the submission scripts and was executed on a mix of Intel SkyLake (Platinum 8168) @2.7GHz and Intel Cascade Lake (Gold 6248R) @3.0Ghz CPUs. The overall speed increase of ~5x generally agrees with our measurements in **Fig. 2d** and the performance of a mix of these CPUs is comparable to the workstation CPUs (according to https://www.cpubenchmark.net). Importantly, RS-FISH runs on such volumes natively and can easily be executed on a cluster or in the cloud, thus easily scales to significantly larger datasets. At the same time, the AIRLOCALIZE implementation is limited to the Janelia cluster, but could be extended to other LSF clusters that support job submission.

Benchmarking analysis details are in the **Supplemental Material**, and all scripts and complete documentation are in the RS-FISH GitHub repository.