Broadband model-based optoacoustic microscopy enables deep-tissue imaging beyond the acoustic diffraction limit

Optoacoustic microscopy (OAM) retrieves anatomical and functional contrast in vivo at depths not resolvable with optical microscopy. Recent progress on reconstruction algorithms have further advanced its imaging performance to provide high lateral resolution ultimately limited by acoustic diffraction. In this work, we suggest a new broadband model-based OAM (MB-OAM) framework efficiently exploiting scanning symmetries for an enhanced performance. By capitalizing on the large detection bandwidth of a spherical polyvinylidene difluoride (PVDF) film while accurately accounting for its spatial impulse response, the new approach significantly outperforms standard OAM implementations in terms of contrast and resolution, as validated by functional in vivo experiments in mice and human volunteers. Furthermore, L1-norm regularization enabled resolving structures separated by less than the theoretical diffraction-limited resolution. This unique label-free angiographic performance demonstrates the general applicability of MB-OAM as a super-resolution deep-tissue imaging method capable of breaking through the limits imposed by acoustic diffraction.


Introduction
Advanced acoustic inversion methods are essential to optimize the achievable resolution, contrast, and overall performance of optoacoustic (OA, photoacoustic) imaging systems operating at depths not resolvable with optical microscopy (>1 mm in biological tissues). The type and location of ultrasound (US) sensor(s) mainly determine the imaging performance in this depth range, thus are generally tailored for the biomedical application of interest [1,2]. For example, OA tomography based on concave piezoelectric arrays has been shown to capitalize on a large angular coverage to render accurate images of arbitrarily-oriented vascular networks [3][4][5]. However, although tomographic microscopic imaging has been enabled with high-resolution arrays [6], deep-tissue OA microscopy (OAM) is typically performed by raster-scanning a single-element transducer [7][8][9][10][11][12]. This approach represents one of the most common OA embodiments and is increasingly being used in preclinical and clinical studies [13,14]. An important advantage of OAM over tomographic OA methods is a significantly simpler and less expensive instrumentation. More importantly, OAM is particularly suitable for exploiting the multiscale OA imaging capabilities by covering an otherwise unachievable signal bandwidth. For example, centimeter-scale depths have been achieved with transducers featuring detection bandwidths of a few MHz [7,15], while ultra-wideband detectors have been used e.g. in raster scan OA mesoscopy (RSOM) for high-resolution imaging at depths around 2 mm-4 mm [13,16]. Multiple scales can additionally be covered simultaneously e.g. via coaxial light focusing in hybrid-focus OAM [17,18] or with a dual-element transducer in quad-mode OAM [19].
The importance of OAM fostered ongoing efforts on the optimization of image formation methods. The synthetic aperture focusing technique (SAFT), also used in radar, sonar, and biomedical US, has become the standard image formation method in OAM [10]. SAFT was introduced in OAM as the virtual detector concept [20], and is generally implemented using the delay-and-sum (DAS) technique in the time domain [21]. More advanced approaches based on weighting factors in the time or frequency domains [22,23] or on delay-multiply-and-sum (DMAS) methods [24,25] have further been shown to provide an enhanced performance. DAS can be regarded as a discrete implementation of the filtered backprojection algorithm, derived from the time-domain OA forward model for short-pulsed excitation [26]. Such a model represents a solid mathematical foundation enabling the development of more accurate acoustic inversion methods. Thus, algorithms based on time reversal techniques [27,28], model-based (iterative) inversion algorithms [29][30][31] or hybrid-domain modeling of a focused transducer [28] have been suggested for OAM. Model-based inversion in the time domain is arguably the most accurate approach to account for the finite size and shape of US sensors [32][33][34]. Additionally, the performance of iterative inversion can be enhanced by properly choosing constraints or regularization terms [35][36][37][38][39][40]. However, implementation of this method for the dense grid of voxels required for high-resolution volumetric imaging has remained challenging due to extremely high computational costs, even when considering efficient parallel implementations in graphics processing units (GPUs) [41,42].
Herein, we introduce a new broadband model-based OAM (MB-OAM) image formation framework that efficiently exploits scanning symmetries to enable GPU-based, high-resolution, volumetric imaging. The underlying time-domain OA forward model accurately accounts for the large bandwidth of the OA signals collected with a custom-made spherical polyvinylidene difluoride (PVDF) transducer, while discretization of the transducer surface additionally enables modeling its spatial impulse response. Model-based inversion is shown to significantly enhance the achievable resolution even beyond the acoustic diffraction limit, thus outperforming standard OAM implementations as well as other optical modalities based on extrinsically-administered contrast agents. The label-free nature of OA ensures that the suggested approach is generally applicable in any preclinical or clinical application requiring visualization of tissues at depths not reachable with optical microscopy.

Results
Accurate modeling of OA signals OAM acquires volumetric images in a raster-scan acquisition protocol ( Fig. 1a and Suppl. Fig. 1) with spatial sampling (scanning step) fulfilling the Nyquist criterion. The lateral resolution in OAM is considered to be limited by acoustic diffraction to [43], where and are the central wavelength and numerical aperture of the US transducer. Typical lateral resolutions are in the range of 40 m, for which scanning steps around a few tens of micrometers are needed. The suggested MB-OAM defines time-resolved OA signals corresponding to each pair of voxel and scanning position, where the lateral positions of the voxels match the x-y-scanning positions (see methods for a detailed description). For a raster-scan configuration, the OA model is translational-symmetricsignals are identical if the relative position between the voxel and scanning point is maintained (Fig. 1b). A spherical US transducer further defines axial symmetries with respect to the acoustic axis (Fig. 1b). Accurate modeling of the impulse response (IR) of the transducer is achieved by splitting it into two terms, namely the electrical impulse response (EIR) defining the effective detection bandwidth, and the spatial impulse response (SIR) accounting for acoustic diffraction effects associated to the finite size of the sensing area [44]. A custom-made transducer consisting of a PVDF film featuring broad detection bandwidth (~1 MHz to 100 MHz) was used herein [11] (Fig. 1c, see methods for a detailed description). However, the spatiallydependent SIR of the spherically-focused, axially-symmetric transducer strongly limits the lateral resolution in out-of-focus regions and standard image formation methods (SAFT and its variants) cannot fully compensate for this effect. In the suggested model-based framework, the SIR for each voxel was estimated by superimposing the corresponding time-resolved OA signals for a set of surface elements (~1000) covering the active sensing area (Fig. 1c). This is equivalent to adding up the OA forward model corresponding to each surface element (see methods for a detailed description). Considering the scanning symmetries, the information in the model-matrix is fully contained in the sub-matrix corresponding to an individual scanning point, thus could be stored in the memory of state-of-the-art GPUs (1.3 GB for half a million voxels and 837 time points, see methods for details). The memory required to store the OA forward model is invariant to the arbitrarily-large lateral dimensions of the scan area, which allows storing the OA signals, the model, and the reconstructed absorber distribution simultaneously on a GPU. Thereby, large volumetric images can be reconstructed at high-resolution in a reasonable time (~12 minutes). The amplitude spectral density of the time-domain OA forward model (columns of the model matrix) reveals the expected diffraction patterns for different frequencies, while their amplitude in the time domain defines a transducer sensitivity field matching the expected diffraction-limited resolution for broadband OA signals (Fig. 1d). The capability to encode frequencydependent information empowers the suggested model-based framework with unique capabilities to reconstruct absorbers with different sizes emitting OA waves in different frequency bands, clearly outperforming alternative approaches e.g. based on superimposing signals or SAFT (Fig. 1e).

Imaging beyond the acoustic diffraction barrier
Mathematically, model-based inversion minimizes an energy functional consisting of a data fidelity term driving the solution towards the observed data and a regularization term that stabilizes the inversion and includes prior information [45]. OAM image formation is challenged by the limited-view scanning geometry and the associated ill-posed nature of the inverse problem [46]. Regularization hence turns essential for computing a credible approximation of the light absorption distribution. Tikhonov regularization, based on the L2 norm, is the classical method for noise-robust inversion and represents the most probable solution given the raw signals and a prior Gaussian distribution of measurement noise. The model-based algorithm based on an L2 regularization term (MB-L2, see methods for details) enabled enhancing the lateral resolution and contrast of OAM images at out-of-focus locations beyond SAFT (Fig. 1e). However, reconstruction of closely separated (~40 m), simulated absorbers at the focal plane indicate that the lateral resolution is still limited by acoustic diffraction (Figs. 2a and 2b). The OA signals and forward model were bandpass filtered between 0.5 and 80 MHz, yielding an effective diffraction limit of 61 m (see methods for details). This is expected considering that MB-L2 is a linear inversion procedure equivalent to time-reversed propagation of US waves [47]. Alternatively, regularization terms based on the L1 norm can be used to promote sparse solutions in a given domain. Particularly, a L1-based regularization term in the image domain translates to most voxels having zero (or close to zero) value. Simulation results with relatively separated absorbers indicated that the full width at half maximum (FWHM) of the absorbers reconstructed with the model-based algorithm with L1 regularization (MB-L1, see methods for details) is considerably smaller (13 m) than the FWHM achieved with SAFT and MB-L2 (41 m, Figs. 2a and 2b). This could be erroneously interpreted as a narrower point spread function (PSF) of the imaging system enabling higher resolution. However, MB-L1 is a non-linear inversion procedure and hence a PSF cannot be defined. Instead, the achievable resolution must be estimated from the reconstructed image of closely separated sources. In this regard, undistinguishable absorbers in the images obtained with SAFT and with MB-L2 could be separated with MB-L1, thus effectively breaking through the acoustic diffraction barrier (Figs. 2a and 2b and Suppl. Fig. 2). Experimental results with an agar phantom embedding a ~20 µm microsphere slightly out-offocus corroborated the simulation results (Figs. 2c and 2d, see methods for a detailed description). The FWHM of the reconstructed sphere was significantly reduced with MB-L1 compared to SAFT and MB-L2, while it was also possible to distinguish two closely separated spheres simulated by shifting the scanning position and adding up the corresponding OA signals (Figs. 2c and 2d, see methods for a detailed description). Additional experiments with an agar phantom embedding carbon fibers with 7 µm diameter further validated the enhanced resolution achieved with MB-L1 (Figs. 2e and 2f). The width of the reconstructed fibers was significantly reduced with MB-L1. More importantly, MB-L1 enabled resolving crossing fibers at points where SAFT and MB-L2 failed (Fig. 2f).

Enhanced angiographic imaging in vivo
The accuracy of the OA forward model turns essential for precise reconstruction of more complex structures in living organisms. Proper selection of the regularization term is also highly important. Most commonly, OA visualizes multi-scale vascular networks in mammalian tissues with no dominant components in space or frequency domains, which challenges defining information a priori on the images. A raster-scan with 25 µm step across the back of a CD-1 mouse (Fig. 3a, see methods for a detailed description) enabled comparing the in vivo imaging performance of different image formation approaches. SAFT, MB-L2, and MB-L1 successfully visualized microvascular structures in this region, although clear differences were observed in image quality and content (Fig. 3b). A smoothing effect was observed in the blood vessels reconstructed with SAFT degrading the achievable resolution. The vascular network at different depths is clearly better resolved in the MB-L2 image, including branches not visible with SAFT. The sharpening effect of MB-L1 compared to MB-L2 was similar to the observations in simulations and phantom experiments. The width of the reconstructed vessels is clearly narrowed compared to the other two approaches. However, MB-L1 appears to reduce the available information in the image. Many of the branches visible in the MB-L2 image are not seen with MB-L1, arguably due to the sparsity constraint. The achievable resolution with different approaches can be more clearly compared in a three-dimensional view of the central part of the scan (Fig. 3c). Individual vessels and branches in MB-L1 and MB-L2 images are shown to be more clearly defined than in superimposed MB-L2 and SAFT images, respectively. A comparison of the image profiles for a region close to a branching point corroborates that MB-L1 can break through the resolution limit of the other two approaches (Fig. 3d). The respective image content was better assessed by quantifying the amount of vessels in the binarized images with an automatic vessel segmentation and analysis (AVSA) algorithm [48] (Fig. 3e, see methods for a detailed description). Statistical analysis of the number of detected vessels from four different mouse back datasets showed that, on average, the number of vessels in the MB-L2 image is higher by almost two-fold with respect to SAFT and MB-L1 images (Fig. 3f). Overall, the suggested model-based image formation framework significantly enhanced the performance of SAFT in vivo, while the regularization choice establishes a trade-off between achievable resolution and visibility of vascular networks.

Towards quantitative clinical imaging
Microvascular alterations are involved in many physiological and pathophysiological processes and serve as indicators of diabetes, cancer, neurological disorders, and other human diseases. The powerful microvascular imaging capabilities of the suggested MB-OAM are best appreciated when validating the images obtained against those rendered with established angiographic methods (Fig. 4a). A bright field image of a region in the back of a nude-Fox1nu mouse enables identifying superficial blood vessels within the skin. Vascular contrast and imaging depth can be enhanced e.g. via injection of indocyanine green (ICG), a water-soluble dye routinely used in the clinics that fluoresces in the near infrared (NIR-I, ~650-1000 nm) window of light. Recently, the availability of cameras sensitive to photons in a second near-infrared (NIR-II, ~1000-1700 nm) window featuring suppressed light scattering and autofluorescence has enabled reaching depths previously unattainable with optical methods. For this, water-soluble quantum dots emitting light in this wavelength range can be used, which are however not approved for clinical use. The vascular structures observed in the fluorescence images are also visible in the image obtained with MB-L2, which additionally enables visualizing smaller and deeper vessels with endogenous contrast (Fig. 4a). The capability of the suggested approach to quantify microvascular changes was further tested by assessing responses to thermal stimuli in images of the cuticle microvasculature of healthy volunteers (see methods for a detailed description). MB-L2 images acquired after exposure to cold and warm water revealed a higher vascular density in the second case (Fig. 4b).
Qualitative image quality may further be enhanced via post-processing e.g. with a Frangi filter, which may however lead to the appearance of inexistent vessels in the images (Suppl. Fig. 3). Considering that the tissue temperature is expected to be approximately the same at the measuring time points, these differences are ascribed to thermally-induced microvascular alterations. Quantification of vasodilation was better achieved in MB-L1 images of the nailfold region (Fig. 4c). Numerical simulations considering spheres with different sizes (see methods for details) indicate that MB-L1 provides more accurate dimensional readings than MB-L2 (Fig. 4d, Suppl. Fig. 4). MB-L1 further capitalizes on the relative sparsity of the images of this area to better resolve arterioles and venules. Consistent results on relative vasodilation were obtained for different vessels of the same volunteer (Fig. 4e) and across different volunteers (Fig. 4f), which indicates the general applicability of MB-L1 for assessing microvascular changes.

Discussion
Numerical simulations as well as phantom and in vivo experiments demonstrated the enhanced performance of the suggested MB-OAM framework with respect to standard OAM. This is in agreement with what has been shown in several OA tomographic configurations [38,[49][50][51]. However, OAM aims at a significantly higher resolution than OA tomography combined with large FOVs. The implementation of volumetric model-based iterative inversion methods is thereby significantly challenged by the fact that the full model matrix, corresponding to the linear operator mapping the initial pressure distribution (OA image) into measured pressure signals, becomes too large to be stored in GPU or even CPU memory. By exploiting the translational and axial symmetries of the model for the spherically focused transducer within the scanning plane and its limited lateral sensitivity, the model matrix was effectively "compressed" into a sub-matrix corresponding to a single scanning position and the voxel grids sufficiently close to it. The size of this sub-matrix is defined by the lateral sensitivity, the desired imaging depth, and the number of time instances, i.e., it is invariant with respect to an arbitrarily-large reconstruction grid. Therefore, the suggested model-based reconstruction framework is expected to be generally applicable in any OAM system. The advantages of model-based inversion stem from the fact that it accurately accounts for the broadband nature of OA signals. This is fundamentally different to other approaches considering an approximation of the transducer sensitivity field, which generally changes for different OA sources due to the frequency dependence of the diffraction-limited acoustic focusing. In the current implementation, the model matrix was estimated theoretically by integrating the acoustic pressure distribution at the surface of the transducer, which represents a valid approximation for the curved thin PVDF film employed. However, the theoretical computation of the model matrix e.g. for transducers based on acoustic lenses or air-coupled transducers is more complex as it involves modeling acoustic interfaces [52,53]. The suggested reconstruction methodology can alternatively be used by experimentally estimating the model-matrix for a transducer position e.g. by raster-scanning a small particle or a light beam in the volume of interest [54]. Other parameters can also be considered in the model, such as the light fluence distribution or acoustic attenuation effects, which may lead to further improvements in quantification and spatial resolution [55,56].
Acoustic diffraction has been considered as a limit for the achievable resolution in OA, similar to light diffraction in optical microscopy [57]. Super-resolution methods have massively impacted life sciences by smartly overcoming the optical diffraction limit [58]. The development of similar approaches enabling breaking through the acoustic diffraction barrier at depths not reachable with optical microscopy is also expected to highly impact the biomedical imaging field and boost the applicability of OA as a research and clinical tool. Well-established optical super-resolution methods can be taken as a reference for this purpose, although fundamental differences exist in physical principles and instrumentation. Recently, localization OA tomography (LOT) exploited the basic principle of photoactivated localization microscopy (PALM) to image beyond the resolution limit imposed by acoustic diffraction [9,[59][60][61][62]. The superresolution method suggested herein (MB-L1) has some analogies with stimulated emission depletion (STED) microscopy as it is based on a raster-scanning protocol where a non-linear response is induced. The non-linearity is however associated with the reconstruction procedure, hence MB-L1 also has similarities with super-resolution methods based on multiple observations with sub-pixel displacements and/or sparse image representations [63,64]. L1 has been shown to reduce the width of reconstructed structures in OA tomography [65,66]. Herein, we have further demonstrated that the suggested OA forward model is sufficiently accurate to better resolve closely separated sourcesbranching vascular structuresin vivo, which effectively demonstrates its performance as a super-resolution biomedical imaging tool. MB-L1 was further shown to provide accurate dimensional readings that can be exploited for the quantification of vasodilation or other microvascular alterations associated to human diseases.
In conclusion, we expect that the suggested MB-OAM framework significantly enhances the performance of this modality and enable the visualization of previously unresolvable structures beyond the acoustic diffraction barrier. The forward model accurately accounts for the transducer spatial response and the broadband nature of the induced pressure waves, which is essential for accurate reconstruction of OA images. The fact that this could be achieved with low memory requirements and in a relatively short time anticipate its general applicability as a biomedical research tool and can further boost the translation of OAM as a label-free angiographic imaging method in the clinical setting.

Forward modeling
Excitation of OA signals in OAM is achieved with nanosecond laser pulses fulfilling thermal and stress confinement conditions. In an acoustically-uniform and non-attenuating medium, the time-resolved pressure signal at a point is given as a function of absorbed optical energy density distribution as [67] (1) where is the Grueneisen parameter, is the speed of sound, is a point where light is absorbed, and is a spherical surface defined as . The forward model in Eq. 1 was discretized with a two-step procedurethe time derivative was discretized using the finite difference method, and the surface integral was discretized using trilinear interpolation of neighboring voxel values [33]. Assuming a constant and , the discrete forward model is then expressed (in arbitrary units) as the linear system of equations (2) where is a vector of pressure signals at all scanning positions, is the model matrix and is a vector of voxel values representing the absorbed energy density in the region of interest (ROI). The model matrix describes the OA response to unit absorption at the voxel grid defined to cover the absorption distribution domain, and can be used to model the signals of a point detector with infinite bandwidth. More specifically, each column vector in corresponds to one pair of voxel and scanning point. The OA signal collected by a finite-size transducer can be approximated by integrating pressure waves over the transducer surface [68]. This approximation represents the spatial impulse response (SIR) of the transducer, which can be modeled by dividing the transducer surface into sub-elements with position and area . The vector of signals collected by the transducer at all scanning positions is then expressed as (3) where is the number of sub-elements covering the transducer surface (Fig. 1c). Combining Eq. 2 and 3, the discrete forward model becomes (4) Thereby, the model matrix accounting for the SIR of the transducer is estimated as a weighted sum of model matrices corresponding to sub-elements.

Efficient reconstruction based on scanning symmetries
Model-based (MB) inversion involves minimizing an energy functional (5) where is the vector of measured signals, is regularization term and is the regularization parameter controlling the trade-off between regularization and data fidelity terms. The importance of is two-foldit makes the optimization procedure more robust to noise, and it incorporates apriori knowledge of the image . Herein, we considered the standard Tikhonov regularization based on the L2 norm, where is given by (6) We refer to MB inversion with Tikhonov regularization as the MB-L2 method. We also considered L1 regularization based on the L1 norm, where is given by (7) We refer to MB inversion with L1 regularization as the MB-L1 method. The solution of Eq. 5 was computed iteratively with the LSQR algorithm [69] for MB-L2 and with the FISTA algorithm [70] for MB-L1.
In both LSQR and FISTA algorithms, the most computationally demanding operations are the matrixvector multiplications and , where and are updated in each iteration. The computational burden originates from the large-scale nature of OAM. This leads to a huge model matrix covering each pair of voxel and scanning position in the entire ROI, even if it is stored in sparse format [41]. However, since the spherically focused transducer barely senses voxels sufficiently distant in the lateral direction, the corresponding model matrix values were approximated as 0, i.e., the model matrix only covers a 1 mm lateral extent (Fig. 1d), independently of an arbitrarily large ROI. The depth extent of model matrix scales with the collected dataset. Also, given that the transducer sensitivity field is translational-symmetric, the model vector is equivalent if the relative position between voxel and scanning point is maintained. Therefore, it was sufficient to generate the model matrix for a single scanning position, as it includes all unique lateral distances between pairs of voxel and scanning point. The "compressed" model matrix can then be pre-calculated in a reasonable time and stored in a memory-efficient way. To perform the matrix-vector multiplications, translational symmetries were considered to find the correspondences between voxels and model vectors (Fig. 1b). Specifically, we maintained two coordinate systemsan absolute coordinate system to index over the voxel grid, and a relative coordinate system, with the origin at each scanning position, to retrieve the corresponding model vector based on the lateral distance between a given voxel and the origin. The matrix-vector multiplications were parallelized and executed on GPU, which greatly improved computational speed.

Numerical simulations
The performance of the suggested MB-OAM methods, namely MB-L2 and MB-L1, was first tested on numerical simulations. In the first simulation, the absorption distribution within a (2 x 2 x 3) FOV was modeled as four truncated paraboloids with diameters 60 µm, 100 µm, 150 µm, and 200 µm (Fig. 1e), and a 25 µm step size was used. The paraboloids were positioned 0.5 mm away from the center of the FOV in the lateral (x and y) directions, and equally separated within a ±1 mm range from the acoustic focus in axial (z) direction. The transducer surface was divided into 1200 sub-elements and the raw OA signals at the center of each sub-element were calculated analytically [68]. The temporal sampling frequency was set to 250 MHz for all simulations, consistent with the experimental setting. The OA signals were normalized to the maximum absolute value and 2% white Gaussian noise (standard deviation of 0.02) was added. MB-L2 and the standard SAFT [10] were used for image formation in this first simulation.
The second simulation was performed by considering two truncated paraboloids with 20 µm diameter at the axial location of the acoustic focus for a FOV of (200 x 200 x 150) and 5 µm step size. The paraboloids were first separated by 60 µm in the x direction to ensure they can be resolved by all three methods in comparison -SAFT, MB-L2, and MB-L1 (Fig. 2a, left). Subsequently, they were separated by 40 µm in the x direction, which was empirically demonstrated to be beyond the acoustic diffraction limit for the filters employed (Fig. 2a, right). Specifically, the raw signals and the model matrix were bandpass filtered between 0.5 MHz and 80 MHz, yielding a diffraction-limited resolution of 61 µm [43]. The OA signals were normalized by the maximum absolute value and no noise was added.
A third simulation was performed to validate the capability of MB-L1 to accurately quantify micro-vessel dimensions. To this end, OA signals corresponding to truncated paraboloid absorbers with diameters of 20, 30, 45, 60, 80, and 100 m were simulated, and independently inverted with both MB-L2 and MB-L1. The simulated absorbers were positioned at the depth of the acoustic focus and at the center of a FOV covering a (250 x 250 x 250) region with 5 µm step size. The OA signals were normalized to the maximum absolute value and 2% white Gaussian noise was added.
OAM set-up A recently proposed burst-mode OAM system was used [11] (Suppl. Fig. 1). The system employs a custom-made, 9 µm foil PVDF spherically focused transducer (7 mm focal distance and 0.43 NA). Three laser sources at 532 nm, 578 nm, and 1064 nm were combined and coupled into a multimode fiber, which was guided through a central hole (0.9 mm diameter) of the transducer, thus providing concentric illumination and acoustic detection. Two of the laser sources have fixed wavelengths at 532 nm and 1064 nm (Onda 532 nm and 1064 nm, Bright Solutions, UK), and the third source is a tunable dye laser (Credo, Sirah Lasertechnik, Germany) with wavelength set to 578 nm. The transducer was translated laterally in the x-y plane above the sample (Fig. 1a) to acquire a time-resolved OA signal at each scanning position. Specifically, a fast-moving scanning stage moves constantly forth and back between the boundaries of the previously defined scan window while monitoring its position. Following each predefined incremental stage movement, all three lasers were triggered in a cascade with a 6 µs delay in between. Motion artifacts were avoided by averting acceleration and deceleration at each scanning position. The system thus enables rapid acquisition of multi-wavelength volumetric datasets over large FOV. In all experiments, the temporal sampling frequency was set to 250 MSPS and no signal averaging was performed.

Phantom experiments
The suggested MB inversion framework was tested on two phantom experiments. A first phantom was used to test if MB-L1 is able to break the acoustic diffraction limit on experimental data. This consisted of a single micro-sphere with 20 µm diameter embedded in 1.3% (w/v) agar. The phantom was placed with the micro-sphere at the axial location of the acoustic focus and a scan covering a (200 x 200 x 150) FOV was performed with 5 µm step size. The 532 nm laser source was used, with up to 2.5 kHz pulse repetition rate (PRR) and 50 µJ per-pulse energy (PPE). The single micro-sphere image in the phantom was first individually formed. Subsequently, the signal volume was artificially shifted by 40 µm in the x direction and super-imposed with the original volume, which is equivalent to the signal volume of two micro-spheres separated by the same distance.
A second phantom consisting of four carbon fibers with 7 µm diameter embedded in 1.3% (w/v) agar was imaged. The purpose of this phantom experiment was to assess the overall performance of SAFT and the suggested MB reconstruction methods to reconstruct elongated structures. The scan covered a (1 x 2 x 1) FOV with 10 µm step size. The PRR was again limited to 2.5 kHz and the PPE of the 532 nm laser was set to 50 µJ.

In vivo mouse skin imaging
The in vivo imaging performance of the suggested MB inversion framework was tested by imaging dorsal mouse skin. To this end, a CD-1 mouse (8 weeks old; Charles River Laboratories, Germany) was anesthetized with isoflurane, and placed in prone position on a heating pad. The back skin was shaved and cleaned before imaging. A scan covering a 10 x 30 mm² FOV with a 25 µm step size was performed, giving a volumetric dataset at each wavelength (532 nm, 578 nm, and 1064 nm). The PRR was limited to up to 2.5 kHz and the PPEs for the three lasers were set to 25 µJ, 25 µJ, and 100 µJ, respectively. Before reconstruction, the raw signals were bandpass filtered between 1 MHz and 120 MHz, and further median filtered with kernel size of 3 x 3 x 3. The animal experiment was performed in accordance with the Swiss Federal Act on Animal Protection and were approved by the Canton Veterinary Office Zurich.

Image visualization and vessel quantification
Visualization of the dorsal skin vasculature was done using maximum intensity projections (MIPs) along all Cartesian dimensions. Volume rendering was performed in Paraview 5.8.0 [71]. The images of the mouse skin vasculature were shown with color-coded depth. This was done by multiplying a background map (reconstructed volume), a foreground map (3-channel color encoder), and a transparency value. Foreground colors were set to blue, green, and orange from deeper to shallower regions. The number of vessels was quantified with a previously reported automatic vessel analysis and segmentation (AVSA) algorithm [48]. In short, reconstructed images were binarized, skeletonized, and vessel center-lines fitted by a spline. Vessel edges were found by computing the image gradient along the normal direction of the central line. To quantify the number of vessels achieved in SAFT, MB-L2 and MB-L1, two signal volumes acquired at 532 nm and 578 nm from the multi-wavelength in vivo dataset were used. Each of the two volumes were further split into half along the y direction, which increases the sample size and gives a total of four individual datasets. Reconstructed vessels in an upper-right sub-region were individually visualized based on branch point, central line and edge (Fig. 3e). The number of vessels in images reconstructed with SAFT, MB-L2 and MB-L1 was plotted, and the mean value was noted for all three methods (Fig. 3f).

Optical angiographic imaging
The in vivo performance of the suggested MB-OAM image formation framework was validated against wide-field optical imaging. To this end, the dorsal skin of an athymic nude-Fox1nu mouse (20 week-old; Envigo RMS R.V., Netherlands) was sequentially imaged with different modalities. The mouse was anesthetized with isoflurane (3% v/v for induction, 1.5% v/v for maintenance) in a mixture of air and oxygen with flow rates 0.8 L/min and 0.2 L/min, respectively. The mouse was placed in prone position on a heating pad maintaining body temperature during the experiment. First, a USB microscope (Dinolite digital microscope, Taiwan) was used to take a bright field image of the mouse. Subsequently, epifluorescence images were taken in the first and second near infrared windows (NIR-I and NIR-II). NIR-I imaging was performed after tail-vein injection of 100 L indocyanine green (ICG, Sigma-Aldrich Chemie GmbH, Switzerland) at a concentration of 1 mg/ml. The fluorescence image was recorded with an EMCCD camera (iXon Life, Andor, UK) under 700 nm excitation. NIR-II imaging was performed following tail-vein injection of 50 L NIR-II quantum dots emitting at ~1600 nm (NBDY-0018, Nirmidas Biotech, USA) at a concentration of 5 mg/ml. Time-lapse fluorescence images were recorded with a commercial SWIR camera (WiDy SenS 640V-ST, NiT, France) under 855 nm excitation. An OAM image was also acquired with 532 nm excitation wavelength and 25 m step size, covering a FOV of 10x30 mm². A big vertical vessel in the middle of the mouse back was selected as an anatomical landmark for comparison of the different modalities. All animal experiments were performed in accordance with the Swiss Federal Act on Animal Protection and were approved by the Canton Veterinary Office Zurich.

Human nailfold imaging
Nailfold imaging in three healthy volunteers was performed to demonstrate the general applicability of the suggested MB-OAM methodology in humans. All volunteers were informed and gave written consent to participate in the experiments. Dilation and constriction of the microvasculature in this region was achieved with a thermal stimulation protocol consisting of two main steps [72]. First, the hand of the volunteer was immersed in cold water (~15°C) for 5 minutes. Three consecutive scans of the index finger of this hand, each taking approximately 1.5 minutes, were performed immediately after immersion. Subsequently, the same hand was immersed in warm water (~40°C) for 5 minutes and the scans were repeated immediately after. All scans were performed at 532 nm and 10 m step size, covering a 4x3 mm² FOV. The PPE and the PRR of the laser were set to 25 J and 2 kHz, respectively.
Images of the entire FOV were first reconstructed with MB-L2. To this end, OA signals were divided into different frequency bands [73]. Specifically, band-pass filters with cut-off frequencies 10-20 MHz, 20-30 MHz, 30-40 MHz and 40-60 MHz were applied. OA signals in each frequency band were independently used for reconstruction considering a regularization parameter of 1000 and 5 iterations. A weighted sum (weighting factors of 0.45, 0.4, 0.1, and 0.05 from low to high frequency bands) of the reconstructed images was then calculated. Images of a smaller FOV (1.6x0.8 mm 2 ) at the junction of nail and skin were formed with MB-L1. To this end, OA signals were band-pass filtered with cut-off frequencies 20-25 MHz. Profiles corresponding to different microvessels were extracted from the 3D images to assess the dilation from cold to warm stimulus. The width of the corresponding vessels was estimated via Gaussian fitting of the profiles. Differences were assessed for different vessels of the same volunteer and across volunteers. The MB-L2 and MB-L1 images were visualized as MIPs along the z (depth) dimension, while the depth information was further color-coded from orange to cyan, corresponding to shallow and deep regions.
Author contributions XLDB and DR conceived the concept and devised the study. WL, UH, AO, YG, and XLDB developed the reconstruction framework and implemented the reconstruction code. UH developed the imaging system. UH and WL performed all experiments. WL performed the reconstructions and data analysis. QZ and ZC performed the optical imaging experiments. JR implemented depth-encoding visualization and vessel quantification code. DR and XLDB supervised the work. All authors contributed to writing and revising the manuscript.
Data generated during this study and code used to process images are available from the corresponding author upon reasonable request.

Ethics declarations
The authors declare no conflicts of interests.