Light-field tomographic fluorescence lifetime imaging microscopy

Fluorescence lifetime imaging microscopy (FLIM) is a powerful imaging technique that enables the visualization of biological samples at the molecular level by measuring the fluorescence decay rate of fluorescent probes. This provides critical information about molecular interactions, environmental changes, and localization within biological systems. However, creating high-resolution lifetime maps using conventional FLIM systems can be challenging, as it often requires extensive scanning that can significantly lengthen acquisition times. This issue is further compounded in three-dimensional (3D) imaging because it demands additional scanning along the depth axis. To tackle this challenge, we developed a novel computational imaging technique called light field tomographic FLIM (LIFT-FLIM). Our approach allows for the acquisition of volumetric fluorescence lifetime images in a highly data-efficient manner, significantly reducing the number of scanning steps required compared to conventional point-scanning or line-scanning FLIM imagers. Moreover, LIFT-FLIM enables the measurement of high-dimensional data using low-dimensional detectors, which are typically low-cost and feature a higher temporal bandwidth. We demonstrated LIFT-FLIM using a linear single-photon avalanche diode array on various biological systems, showcasing unparalleled single-photon detection sensitivity. Additionally, we expanded the functionality of our method to spectral FLIM and demonstrated its application in high-content multiplexed imaging of lung organoids. LIFT-FLIM has the potential to open up new avenues in both basic and translational biomedical research.

FLIM has the potential to open up new avenues in both basic and translational biomedical research.

Main
Fluorescence lifetime imaging microscopy (FLIM) [1,2] has been extensively employed in a wide spectrum of biomedical applications, ranging from single cell studies [3][4][5] to medical diagnosis [6][7][8]. Rather than imaging the time-integrated fluorescent signals, FLIM measures the time-lapse fluorescent decay. Because the lifetime of a fluorophore is dependent on its molecular environment but not on its concentration, FLIM enables a more quantitative study of molecular effects inside living organisms compared with conventional intensity-based approaches.
The FLIM techniques are generally stratified into two categories: frequencydomain FLIM and time-domain FLIM. Frequency-domain FLIM utilizes frequencymodulated light to illuminate the sample, inducing fluorescence that oscillates at the same frequency but with a reduced modulation depth and a phase shift due to the noninstantaneous fluorescence decay [9]. To extract the phase shift and modulation depth from the fluorescence signals, frequency-domain FLIM modulates the gain of the camera at the same or slightly different frequency as the excitation light [10][11][12][13]. The fluorescence lifetime can then be derived from the measured signals by comparing them to a reference fluorophore with a known lifetime. In contrast, time-domain FLIM illuminates the sample with pulsed laser excitation, followed by measuring the fluorescent decay in sequential time channels using an ultrafast detector or detector array. Due to the direct measurement of the fluorescence decay, time-domain FLIM provides more accurate lifetime measurements than its frequency-domain counterpart, particularly for complex fluorophores that exhibit multiple decay components [14].
To acquire a two-dimensional (2D) lifetime map, most time-domain FLIM systems scan the sample either in the spatial domain, such as a confocal FLIM microscope [15,16], or in the time domain, such as a time-gated FLIM camera [17,18]. The spatial and lifetime resolution of the resulting map is determined by the number of scanning steps used in the respective domain. To acquire high-resolution images, conventional time-domain FLIM systems require extensive scanning, which can result in prolonged acquisition times. And this problem becomes more pronounced in three-dimensional (3D) imaging, as it mandates additional scanning along the depth axis.
The use of single-photon avalanche diode (SPAD) arrays in time-domain FLIM provides a solution to this long-standing problem by enabling parallel measurement of fluorescence decays at multiple image pixels. Moreover, SPAD arrays offer considerably greater sensitivity than conventional gated cameras, making them an excellent choice for low-light imaging applications. A SPAD imager can operate in either time gate or timecorrelated single photon counting (TCSPC) mode, with TCSPC being the preferred detection method for its higher precision, faster speed, and greater sensitivity. However, the native fill factor of 2D SPAD arrays operating in the TCSPC mode is generally low (<10%) [19] due to the physical limitations posed by the inclusion of intricate timing electronics for each pixel. Although the addition of microlenses can recover some of the fill factor loss, this method is effective only for collimated impinging light. In contrast, a linear SPAD array offers a significantly higher fill factor close to 50% [20], which results in a substantially increased light throughput. Moreover, the fabrication cost of a linear SPAD array is much lower than its 2D counterpart, making it more accessible for general labs. Nonetheless, when it comes to high-resolution imaging of a 2D or 3D scene using a linear SPAD array system, the challenge is much like that of point-scanning FLIM-the conventional approach involves scanning the entire field of view or volume using a vast number of steps, which can result in a protracted imaging duration.
To tackle the aforementioned challenges and streamline the acquisition of 3D FLIM data using a linear SPAD array, we devised a novel computational imaging technique

Results
Operating principle and characterization Based on computational imaging, LIFT-FLIM operates in two steps: data acquisition and image reconstruction. We show a LIFT-FLIM system in Fig. 1a. Upon pulsed laser excitation, the fluorescence is collected by a microscope objective lens with a high numerical aperture (NA) and forms an intermediate image at the microscope's side image port. A beam splitter then divides the fluorescence into two beams. The transmitted light is recorded directly by a complementary metal-oxide-semiconductor (CMOS) camera, resulting in a reference intensity image. On the other hand, the reflected light creates an intermediate image on a scanning mirror. As the mirror tilts, it imparts twice its angle of tilt onto the outgoing rays. The reflected light from the scanning mirror is then collimated by a lens and forms a pupil image at a plane, where we position a dove prism. By adjusting the tilt angle of the mirror, we can shift the position of the pupil image on this plane. This enables us to selectively direct the light rays corresponding to a specific view angle through the dove prism, which will rotate the light rays and produce a rotated perspective image (Fig. S1). Next, we use a cylindrical lens to compress the rotated perspective image into a line, which is essentially an en-face projection of the original perspective image along an orientation twice the rotation angle of the dove prism. This transformed line image can be either directly measured by a linear SPAD camera (Fig. 1b) or further spectrally dispersed and measured by a time-gated camera (Fig. 1c).
To compute a 3D image, conventional light field cameras require the acquisition of a comprehensive set of spatial and angular information of a light field, resulting in a significant data load. However, our previous work demonstrated that this acquisition method is inefficient and generates a substantial amount of redundant data [22], which can be reduced by distributing a nonlocal image acquisition process, such as en-face projection measurement, into different views. Moreover, this allows for the measurement of a high-dimensional light field using low-dimensional detectors, which are typically lowcost and feature a higher temporal bandwidth. In LIFT-FLIM, we leverage this advantage and capture only en-face 1D projection images at each scanned sub-pupil location. Furthermore, we can create an arbitrarily shaped off-focus point spread function (PSF) by strategically shuffling the orientation angles of our projection measurements across various views in a programmed manner (Supplementary Note 2).
We formulate the image formation of LIFT-FLIM using a linear model. For a given perspective image P at view ( = 1, 2, … , ), the projection measurement along angle is where P has a dimension of 2 ( is the image dimension in pixels), is the en-face projection operator, and is the image rotation operator, which describes the function of the dove prism rotated at 2 ⁄ .
Rather than capturing a complete set of = projection angles at each view, we acquire only a subset of projection angles at view . This process can be explicitly written as: Here f is a stack of projection measurements (also referred to as a sinogram), and it has a dimension of , and −1 + = .
denotes the measurement noise.
is a combined function of en-face projection and image rotation operators on perspective image P . Because the images from different views capture the same scene, they share a common underlying content (Supplementary Note 1), with only a depth-dependent disparity between any two sub-aperture images, as shown in Fig.  S1. Therefore, the correlation between sub-aperture images can be modeled by digitally propagating the light field, i.e. the sub-aperture image P at view can be related to a depth-dependent image feature kernel ℎ( ) through an invertible shearing operator as P = ( )ℎ( ), where is also a function of depth (Supplementary Note 1). Accordingly, we transform Eq. 2 to: The overall image forward operator ( ) becomes a function of depth , which is essential for recovering image ℎ( ) with various focal settings. Noteworthily, although individual P is measured at only a subset of projection angles, the underlying image feature kernel ℎ( ) is measured on a complete angular basis, as ( ) concatenates image rotation operators across all views. For direct fluorescence lifetime measurement using a linear SPAD array with TCSPC ( Fig. 1b), the image formation model is a time-lapse version of Eq. 3, which can be expressed as: where f( ) is a time-lapse sinogram constructed by the projection measurements at the time bin of a TCSPC temporal histogram. For spectral FLIM measurement using a gated ultrafast camera (Fig. 1c), the image forward model is a function of both time and wavelength : where f( , ) is a spectrally resolved, time-lapse sinogram constructed by the projection measurements at the gated time and wavelength . The image reconstruction of LIFT-FLIM and -sFLIM involves solving the inverse problems of Eq. 4 and 5, respectively. Like standard computed tomography, this can be accomplished through simple inverse Radon transform or more advanced optimization algorithms like a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [27,28]. We depict the workflow for processing the light field data, such as image refocusing, extending the depth of field, and rendering a 3D image in Methods.
Akin to conventional light field cameras, LIFT-FLIM and -sFLIM divide the aperture to extract the depth information. Therefore, they have a reduced lateral resolution (~1.8 μm) compared with the native diffraction-limited resolution of the objective lens (Fig. S2).
To improve the quality of the reconstructed images, we developed a deep-learning-based image enhancement neural network [29][30][31] (Fig. 2). The input to the neural network consists of reconstructed LIFT depth images, a diffraction-limited reference image captured at the depth zero, and digital propagation matrices (DPMs), which represent the axial distance from the reference image plane to the target plane on a per-pixel basis [32]. The neural network then uses a PixelCNN++ architecture [33] to generate high-resolution outputs at corresponding depths. Further details about the network can be found in Methods. The axial resolution, determined by the NA of the objective lens and the number of views acquired, was measured to be ~3.0 μm for point objects (Fig. S2). The temporal resolutions of LIFT-FLIM and -sFLIM depend on the characteristics of the image sensor. For example, using a linear SPAD array provides a temporal resolution of 50 ps with TCSPC [34], whereas using a gated ultrafast camera yields a temporal resolution of 70 ps [35].

LIFT-FLIM of mixed fluorescent beads
We validated the 3D lifetime imaging performance of LIFT-FLIM on fluorescent beads. We mixed three types of fluorescent beads with lifetimes of 1.5 ns, 3.4 ns, and 4.0 ns, respectively, in an agarose gel. We simultaneously excited the beads using a filtered supercontinuum laser and imaged the fluorescence using LIFT-FLIM with a linear SPAD array. Moreover, we captured the ground-truth intensity images (Fig. 3a) at depths from -8 µm to 8 µm using a reference camera by mechanically scanning the microscope's focus.
To compare LIFT-FLIM images with the ground-truth images obtained, we summed the signals at all time bins in the TCSPC temporal histogram and reconstructed the time-integrated images at the corresponding depths (Fig. 3b). The resulting images exhibit a high degree of similarity to the ground-truth images, demonstrating the system's numerical refocusing ability. We further generated the time-lapse LIFT-FLIM images and computed the average lifetime at each image pixel using mono-exponential curve fitting (Fig. 3c). Three representative fluorescence decays reconstructed at beads' locations are shown in Fig. 3d. The derived fluorescence lifetimes are consistent with the beads' specifications. Furthermore, we generated a histogram of the lifetimes of all pixels at depth zero (Fig. 3e). This histogram displays three distinct peaks that correspond to the lifetimes of three different types of fluorescence beads. This observation reinforces the reliability and accuracy of our lifetime measurement.

LIFT-FLIM of a mouse kidney tissue section
We tested LIFT-FLIM on a standard biological sample (a mouse kidney section, FluoCells™ Prepared Slide from ThermoFisher) and demonstrated its ability in lifetime unmixing. The sample was stained with Alexa Fluor 488 wheat germ agglutinin (WGA) for labeling cell membrane and Alexa Fluor 568 phalloidin for labeling filamentous actin (F-actin). These two fluorophores have distinct but close fluorescence lifetimes (Alexa Fluor 488, 2.6 ns vs. Alexa Fluor 568, 2.9 ns). The fluorescence intensity image captured at the focal plane by the reference camera is shown in Fig. 4a. The refocused LIFT-FLIM fluorescence lifetime images at representative depths are displayed in Fig. 4b. Figure 4c shows the fluorescence decay curves measured at two fluorophore locations, and Fig.  4d shows the histogram of all pixels' lifetimes at depth zero, where the two peaks indicate the two underlying lifetime components.
Next, we applied an unsupervised phasor approach [36,37] to the fluorescence lifetime data and calculated the probability of each pixel belonging to a specific lifetime component cluster. Figure 4e displays the phasor plot for the fluorescence lifetime image at depth zero, with each data point color-coded to represent its corresponding probability and overlaid with probability contour lines. We then classified the image pixels in the timeintegrated LIFT-FLIM image based on this probability and unmixed the fluorophores into pseudo-colored channels. Figure 4f shows a representative unmixed image at depth zero (red channel, phalloidin; green channel, WGA). Repeating this procedure for all depths yields a 3D unmixed image, as shown in Fig. 4g.

LIFT-FLIM of a human lung cancer pathology slide
We demonstrated LIFT-FLIM in autofluorescence imaging of an unstained human lung cancer pathology slide. Previous studies show that FLIM can access tumor metabolism by imaging endogenous chromophores such as NAD(P)H and FAD, enabling its application in cancer diagnosis and intraoperative surgical guidance [38,39]. Particularly in pathological imaging, FLIM holds great promise as an alternative approach for labelfree detection of tissue lesions [8,40,41]. However, conventional FLIM microscopes with a high collecting NA suffer from a shallow depth of field. When imaging a panoramic FOV through multiple captures and stitching, the system must mechanically adjust its focus at each position to correct for potential focal drift that can occur during extensive scanning, complicating the imaging procedure. Here we show that, by using numerical refocusing, LIFT-FLIM enables an extended depth of field and allows for capturing an all-in-focus image without the need for accounting for the focal drift.
We excited the sample at 450nm and collected the autofluorescence in the range of 490-700 nm. The primary endogenous fluorophore that accounts for the fluorescence emission at this wavelength is flavin adenine dinucleotide (FAD). To image a large FOV, we scanned the sample and stitched the images. The resultant fluorescence intensity image captured by the reference camera is shown in Fig. 5a, where certain parts of the FOV are blurred due to the focal drift. In contrast, LIFT-FLIM can numerically correct for this defocus error in post-processing and form an all-in-focus image, as shown in Fig. 5b. For quantitative comparison, we plotted signal intensities along a dashed line in Fig. 5ab and show the results in Fig. 5c. The image features appear to have much sharper edges in LIFT-FLIM compared to those captured by the reference camera (~36% reduction in full-width at half maximum). Next, we computed the lifetimes for the stitched all-in-focus LIFT image and presented a lifetime map in Fig. 5d. A zoom-in area (Fig. 5e) shows a significant level of lifetime heterogeneity. To correlate this observation to the tissue state, we stained an adjacent slide from the same tissue sample using standard hematoxylin and eosin (H&E) and imaged it under a widefield microscope. After the histological image was obtained, a pathologist reviewed it to identify the boundary between the tumor and normal tissue, as illustrated in Fig. 5f. Comparing the average pixel lifetimes above (1.9±0.3 ns) and below (2.6±0.4 ns) the annotated boundary ( Fig.  5f) reveals a significant difference (Fig. 5g). The observed reduction in autofluorescence lifetimes in the tumor areas compared to that in the normal tissue is consistent with previous reports [42][43][44] and may indicate a shift towards glycolysis and cancer metabolism [45]. To classify the tissue based on the lifetime, we again applied an unsupervised phasor approach [36,37] to the fluorescence lifetime data. The resultant phasor plot and classified tissue map are shown in Fig. 5h and 5i, respectively (red channel, tumor; green channel, normal tissue).

LIFT-sFLIM of lung organoids
We demonstrated LIFT-sFLIM in 3D multiplex imaging of lung organoids. Organoids, 3D multicellular stem-cell-derived constructs that mimic in vivo tissue, have gained growing interest for modeling tissue development and disease [46][47][48]. Particularly, organoids hold great promise for high-content phenotypic screening because they recapitulate many aspects of parent tissues and can be derived from patient material, rendering them ideal model systems for personalized medicine and drug discovery [49][50][51][52][53].
One primary challenge for high-content phenotypic screening of organoids is extraction of multivariate information from organoids labeled with multiple biomarkers [54][55][56]. Here we show that, by acquiring both the spectral and lifetime information, LIFT-sFLIM provides a powerful solution to overcome this challenge. We cultured lung alveolar organoids with different combinations of primary healthy human lung fibroblasts and epithelial cells grown on alginate scaffolds that mimic the alveolar micro-architecture [57]. We used the antibodies tabulated in Supplementary Table 1 and labeled epithelialmesenchymal transition by α smooth muscle actin (α-sma) expression, ECM deposition by collagen (collagen I) expression, cell apoptosis by SMAD signaling pathway (smad3), and cellular senescence by P16 INK4A (p16) expression. Figure 6a depicts the fluorescence emission decay curves and spectra of the four fluorophores that were utilized in the secondary antibodies. While the fluorophores AF 532, 546, and 568 have close fluorescence lifetimes, their spectral emission peaks are well separated. On the other hand, AF 546 and AF 555 exhibit significant spectral overlaps but differ in fluorescence lifetimes. The combination of four fluorophores used in this study presents a challenge for conventional imaging techniques. Specifically, neither FLIM nor spectral imaging alone can simultaneously capture and distinguish all four fluorophores. This limitation underscores the need for innovative imaging approaches, such as LIFT-sFLIM, which can integrate both spectral and temporal information to enable reliable separation and quantification of multiple fluorophores in complex biological samples.
Using LIFT-sFLIM, we acquired a five-dimensional (5D) dataset ( , , , , ) ( , , , spatial coordinates; , fluorescence decay time; , wavelength). Figure 6b shows the LIFT-sFLIM reconstructed intensity image at depth zero. The wavelength-integrated lifetime image and time-integrated wavelength image at depth zero are shown in Fig. 6cd, respectively. To unmix the fluorophores, we applied a spectral-lifetime phasor approach to the 5D dataset. The resultant color-coded phasor plots in the lifetime and spectral domains are shown in Fig. 6e-f, respectively. Consistent with the spectral and lifetime data presented in Fig. 6a, our analysis revealed two distinct clusters in the lifetime phasor plot and three distinct clusters in the spectral phasor plot. By combining spectral and temporal information, we separated the fluorophores into four color-coded channels. Representative images at depth zero are shown in Fig. 6g. By repeating this procedure at all depths, we generated a 3D color-coded image that depicts the distribution of each fluorophore in the organoid, as shown in Fig. 6h.

Discussion
Using LIFT-FLIM for 3D lifetime imaging offers a crucial benefit of reducing the number of scanning steps required compared to traditional point-or line-scanning time-domain FLIM techniques. To produce a 3D image of × × voxels, a FLIM system that uses point-or line-scanning requires a total of × × or × (if line scans are done along the y axis) scanning steps, respectively. Here, , , and denote the number of spatial samplings in a 3D space. For simplicity, we consider = = . In contrast, because LIFT-FLIM distributes projection measurements into different views, it demands only scanning steps, where is a total number of projection angles. Therefore, LIFT-FLIM reduces the scanning steps required by a factor of 2 × / or × / compared to point-or line-scanning systems.
For non-compressive measurement, we set equal to . Additionally, as demonstrated in Supplementary Materials, our findings indicate that, in the light field imaging, the effective number of depth samplings, , equals the number of angular samplings, . As a result, the scanning reduction factor is either × or when compared to point-or line-scanning systems. With our current and values set at 180 and 15, respectively, the resulting scanning reduction factors are 2,700 and 15 in comparison to point-or line-scanning systems.
Alternatively, like sparse-view computed tomography [58], we can choose an less than for compressive measurement. We define a compression ratio (CR) as To quantify the dependence of the reconstructed image quality on the CR, we adopted the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) as evaluation metrics. We varied the CR by increasing and calculated the corresponding PSNRs and SSIMs. Figure S3 illustrates reconstructions of sparse and complex objects at different CR values. In general, reducing the CR improves both PSNR and SSIM in the reconstructed image. Additionally, our findings suggest that the quality of the reconstructed image is highly dependent on the CR for complex objects, like the lung tumor image displayed in Fig. S3c. In such cases, a lower CR value, less than 4.5, is necessary to achieve high-quality image reconstruction (SSIM≥0.9). Conversely, when imaging a sparse object, such as an USAF resolution target, a CR of 9 is sufficient to recover a high-quality image. Therefore, by adjusting , LIFT-FLIM can tailor the CR to the complexity of a sample, resulting in effective measurements for a given object.
The imaging speed of LIFT-FLIM is determined by the total number of projections acquired and the time duration at each projection. For LIFT-FLIM using a linear SPAD array, the duration at each projection includes both the pixel exposure time and temporal histogram readout time. For LIFT-sFLIM using a gated ultrafast camera, the duration at each projection equals the product of the number of time gates and the camera frame time. Importantly, when imaging simple objects, the system can be operated in the compressive measurement mode, where a reduced can be acquired to accelerate the imaging speed without compromising the image quality.
The spatial resolution of LIFT-FLIM is fundamentally limited by optical diffraction when performing non-compressive measurements. Due to the division of the aperture, LIFT-FLIM has a lateral resolution of / , where is the wavelength, is the focal length of the objective lens, and is the sub-aperture diameter associated with a perspective image. Given views, = 0 /√ , where 0 is the original aperture of the objective lens. Therefore, the lateral resolution is √ times greater than the native resolution of the objective lens. Although this is a common issue encountered by all light field cameras, we can mitigate it by acquiring fewer views and increasing the sub-aperture size to enhance the resolution at the expense of reduced depth accuracy. On the other hand, when performing compressive measurements, the spatial resolution of LIFT-FLIM is practically limited by the CR. While a higher CR is favored in terms of imaging speed, it deteriorates the reconstructed image quality and resolution for complex objects. Hence, selecting an appropriate CR value for a given object involves striking a balance between imaging speed and resolution.
LIFT-FLIM images can be reconstructed and analyzed in real-time. For instance, when processing uncompressed measurement data, a simple inverse Radon transform takes about 0.13 seconds per time bin on an Nvidia RTX3080Ti GPU with CUDA. Subsequently, deep learning enhancement and phasor analysis require 0.079 and 0.024 seconds, respectively. Parallel computing reduces the total post-processing time to less than 0.3 seconds.
The light throughout of LIFT-FLIM depends on the sub-aperture size of a perspective image, the ratio of projection line image width to the detector pixel's size, and the fill factor of the image sensor. As detailed in Supplementary Materials, the LIFT-FLIM is built on an unfocused light field imaging configuration, where the projection line width at the image sensor equals to the sub-aperture diameter, , multiplying with a pupil demagnification ratio, . Given the pixel pitch, , and fill factor, , the percentage of light measured by the image sensor pixel is Here / 0 describes the light loss due to the view selection during pupil scanning, where 0 is the original aperture of the objective lens. Therefore, a lower pupil demagnification ratio (i.e., a shorter focal length of the cylindrical lens in Fig. 1a) can lead to a higher system light throughput. In our current system, due to use of only off-the-shelf optics and a SPAD array, we have equal to 0.1, resulting in an overall light throughput of 0.01. To further enhance the system performance, one possible approach is to utilize custom optics that feature a lower pupil demagnification ratio, , together with a rectangularly shaped SPAD pixel that has a longer pixel pitch, , in the direction of the projection line width. Alternatively, instead of scanning the pupil to choose the views, it is possible to simultaneously capture all perspective images by employing an array of dove prisms with different orientations, as we have previously demonstrated [26]. However, this setup necessitates the use of multiple linear SPAD arrays, each of which measures a projection line image in a synchronized fashion.
On the other hand, for LIFT-sFLIM using a gated ultrafast camera, the light throughput is determined by the sub-aperture size of a perspective image, the diffraction efficiency of the grating, , and quantum efficiency of the gated ultrafast camera, .
Since / 0 = 1/√ , where is the total number of views acquired, Eq. 8 can be rewritten as sFILM = /√ . Hence, reducing the number of angular samplings can boost the light throughput, but this comes at the cost of decreased depth accuracy. Noteworthily, here the pupil magnification ratio, , has no effect on the light throughput. Rather, it governs the spectral resolution of the system like in a conventional pushbroom imaging spectrometer [59,60].
To investigate how the number of photons received at a pixel affects the quality of the reconstructed image, we conducted simulations under a shot-noise-limited condition. Provided that the pixel with the maximum count in the image collects photons, the corresponding shot noise is √ photons. We introduced photon noise to all pixels in the projection images and reconstructed the images with various values of , while maintaining a constant number of projections across all data points in the plot. Figure S4 presents the reconstruction results of a Shepp-Logan phantom under different values.
The results indicate that a larger (i.e., more photons) can lead to a higher PSNR. For high-quality image reconstruction (PSNR≥20 dB), must be greater than 64 photons.
To sum up, we have created a highly data-efficient 3D FLIM technique that relies on light field tomography and extended its capabilities to 3D sFLIM. We believe that LIFT-FLIM and -sFLIM will find broad applications in high-throughput and high-content imaging of biological cells and tissues, opening up new avenues for both fundamental and translational biomedical research.

Experimental setup
In a LIFT-FLIM and -sFLIM system, we used an epi-fluorescence microscope (IX83, Olympus) as the front-end optics and excited the sample with a pulsed laser source (SuperK FIANIUM, FIU-15, NKT Photonics, for LIFT-FLIM; SIRIUS GR-2, Spark Laser, for LIFT-sFLIM). The emitted fluorescence is collected by a microscope objective lens (UPLXAPO60XO, Olympus; UPLXAPO20X, Olympus), and an intermediate fluorescence image is formed at the side image port of the microscope.
To split the light, we employed a beam splitter (BSX16, Thorlabs), which transmits 10% of the light to a reference camera (CS2100M-USB, Thorlabs) and reflects 90% of the light to the LIFT-FLIM camera. We placed a scanning mirror (MR-10-30, Optotune) at the intermediate image plane to shift the pupil image.
The fluorescence is then directed through a 4f system, which consists of two lenses (ACT508-250-A and AC254-150-A, Thorlabs) with a focal length of 250 mm and 150 mm, respectively. To rotate the perspective image, we mounted a Dove prism (PS990M, Thorlabs) on a motorized rotation stage (PRM1Z8, Thorlabs) and positioned the assembly at the Fourier plane of the 4f system. We also positioned a cylindrical lens (LJ1095L1-A, Thorlabs, invariant axis along the y-axis) 131mm after the second lens in the 4f system, which generates a 1D en-face projection of a perspective image along the y-axis. To locate the projection line image, we identified the line with the smallest width. Additionally, we compensated for the focal shift and spherical aberration introduced by the cylindrical lens by defocusing.
The subsequent system is split into two arms, namely the LIFT-FLIM and LIFT-sFLIM arms. The former employs a linear SPAD array [34], while the latter utilizes a 2D ultrafast time-gated camera (High rate image intensifier, LaVision). To switch the light path between the two arms, we placed a flip mirror (TRF90, Thorlabs; PF10-03-G01, Thorlabs) at the line image plane.
When the mirror is positioned at 1, the fluorescence is directed towards the LIFT-FLIM sub-system through a camera lens (YN100mm F2, YONGNUO) and directly measured by the linear SPAD camera. The linear SPAD camera comprises 256 effective CMOS SPAD pixels with a pitch of 26.2µm. Operating in the TCPSC mode, the SPAD camera provides a temporal resolution of 50 picoseconds [34]. It is connected to a FPGA (Spartan 6, Xilinx) with 64 time-to-digital converters (TDCs) and histogram engines, enabling it to process up to 8.5 giga-photons per second. By rotating the dove prisms in a set of angles at assigned views, we sequentially acquired the 1D en-face projections and constructed a sinogram.
In position 2 of the flip mirror, the emitted fluorescence is directed to the LIFT-sFLIM sub-system. The line image is relayed to the image sensor plane by a pair of camera lenses (YN100mm F2, YONGNUO). To disperse the line image along the x-axis, we positioned a transmission diffraction grating (GT50-03, Thorlabs) at the Fourier plane of the relay system. The resultant dispersed projection image is then sampled in time by an ultrafast time gate and further relayed to a 2D camera (CS2100M-USB, Thorlabs) by a camera lens (YN100mm F2, YONGNUO). By varying the delay between the time gate and the laser reference signal, we acquired a series of time-resolved dispersed projection images. To synchronize the scanning mirror, the dove prism rotation stage, the camera, and the laser, we employed a digital delay generator (DG645, Stanford Research Systems). To maximize information content for image reconstruction, we chose the dove prism rotation angles from a set of angles that are evenly spaced in the range of [0, 90°].
To tune the illumination wavelength from the supercontinuum laser, we built a wavelength-selecting module (Fig. S5) using a digital micromirror device (DMD). The collimated white laser beam is first dispersed by a transmission diffraction grating (GT50-03, Thorlabs) and line focused onto the surface of the DMD (DLP LightCrafter 6500, Texas Instruments) through a cylindrical lens (LJ1125L1-A, Thorlabs). The broadband illumination has a line dispersion of 54.4 nm/mm on the DMD surface. The DMD has 1920×1080 micromirrors, each of which can be individually tilted ±12 o relative to the norm. Each column of the DMD corresponds to a different wavelength with a 0.4nm/column wavelength resolution. By adjusting the mirror pattern, we can select any desired illumination wavelengths. The laser light of selected wavelengths is then spatially recombined by another identical set of cylindrical lens and diffraction grating and directed towards the LIFT-FLIM sub-system.
When imaging the mixed fluorescence beads and mouse kidney tissue section, we used multiband excitation with two different wavelengths (488 nm and 561 nm) and separated fluorescence from excitation using the combination of a multiband dichroic mirror (ZT405/488/551/647rpc, Chroma) and a multiband emission filter (ZET405/488/561/647m, Chroma). For imaging the human lung cancer pathology slide, we used 450 nm laser excitation, a 495 nm dichroic mirror (T495lpxr, Chroma), and a long-pass emission filter (ET500lp, Chroma). In the case of lung organoids, we used 532 nm laser excitation, a 532 nm dichroic mirror (ZT532rdc, Chroma), and a long pass emission filter (ET542lp, Chroma). The laser fluence at the sample focal plane was approximately 9.9 x10 -7 J/cm 2 , 1.1x10 -7 J/cm 2 , and 2.9 x10 -3 J/cm 2 for the mouse kidney section, lung cancer pathology slide, and lung organoid imaging experiments, respectively. These laser fluences were well below the cell damage threshold of 4 J/cm 2 [61,62].

Image reconstruction
To obtain an image of a monochromatic scene at a specific time point and depth from the measurement described by Eq. 3, we iteratively solve an optimization problem: where ‖. ‖ 2 denotes the 2 norm, ‖. ‖ 1 denotes the 1 norm, and (•) is a data regularization term. is a hyperparameter that balances the data fidelity and regularization term. In the framework of regularization by denoising [63], (•) is not explicitly specified, and the regularization can be implemented by a state-of-the-art image denoising algorithm such as BM3D or a neural network. We adopted the BM3D and total variation (TV) denoisers for the regularization due to the availability of efficient algorithms [64]. Besides the iterative method [27,28], inverse Radon transformation is an alternative approach that could have been used for image reconstruction with lower computational cost.

Refocusing and extending the depth of field
where is a depth-dependent shearing parameter (Supplementary Note 1). In conventional light field imaging, refocusing is performed by shifting and adding the subaperture images [65]. Unlike conventional light field cameras, LIFT-FLIM first rotates a perspective image, followed by transforming the rotated image into a line. Therefore, the depth-dependent shearing must be performed parallel to the projection axis.
For sub-aperture ( , ) at the projection angle of , the shearing of 1D subaperture projection is given by For numerical refocusing, we applied the correspondent shearing factor to each projection image and updated sinogram for reconstructing the depth image (Supplementary Note 1).
Extending the depth of field can be achieved through a similar approach to conventional light field imaging, which involves refocusing onto different depths, extracting the sharpest feature for each pixel, and assembling an all-in-focus image [66].

System calibration and resolution
Scanning mirror calibration To ensure that the scanning range of sub-apertures fully utilizes the entire aperture of the objective lens, we calibrated the scanning mirror's horizontal and vertical tilt angles. Additionally, to optimize the light throughput of each sub-aperture, we employed a rectangular iris instead of a round one at the aperture stop. The rectangular shape reduces the gaps between adjacent scanned pupil positions and allows more light to pass through a sub-aperture.

Projection center calibration
To calibrate the central position of each projection line image at the sensor plane, we imaged a pinhole (P10D, Thorlabs) positioned at the center of the FOV on the sample stage. We captured images of the pinhole at every projection angle and view and directly localized the center of each line image as 0 ( , ) (Supplementary Note 1).
Subsequently, we extracted the projection data based on the center location to form a sinogram.

Spectrum calibration and resolution
To calibrate the spectral response, we positioned a pinhole (P10D, Thorlabs) at the sample stage and illuminated it with monochromatic light at varied wavelengths. The resulting pixel locations of the projections were recorded and fitted with a linear polynomial, as illustrated in Fig. S7a. The slope of the line determines the spectral sampling of the system, which was calculated to be 0.14 nm. The spectral resolution is defined as the full-width at half maximum (FWHM) of the spectral response. A 1 nm bandpass filter (FL532-1, Thorlabs) was used to limit the source wavelength for this measurement, and the raw spectral response is displayed in Fig. S7b, where the FWHM was approximately 9.2 nm. However, this width was a convolution of the geometrical image of the pinhole on the camera (approximately 7 pixels), the bandwidth of the light source (approximately 7 pixels), and the system spectral resolution. The width of a convoluted function (in pixels) can be computed as [67]: where denotes the width of the function, * denotes the convolution operator, and ( = 1, 2, 3) denotes the individual function in a discrete form. Based on this equation, the width of the spectral resolution on the camera was estimated to be 48 pixels. Given a 0.14 nm spectral sampling, the spectral resolution is 6.6 nm.

Spatial resolution and field-of-view(FOV)
To quantify the spatial resolution, we imaged a fluorescence bead with a diameter of 4 μm (F8858, Thermo Fisher), and the raw reconstruction results are presented in Fig. S2ad, where the lateral and axial full-width at half maximum (FWHM) are approximately 4.6 μm and 5.7 μm, respectively. However, the lateral FWHM width was a convolution of the geometric image of the bead on the camera (around 4 pixels) and the system lateral resolution. The width of a convoluted function can be calculated as [67]: where denotes the width of the function, * denotes the convolution operator, and ( = 1, 2) denotes the individual function in a discrete form. Using this formula, the width of the lateral resolution on the camera was calculated to be 2 pixels. Given a 0.9 μm spatial sampling (camera pixel pitch of 26.2 μm divided by system magnification ratio of 29), the lateral resolution was estimated to be 1.8 μm. Similarly, for axial FWHM, the width was a convolution of the bead size and the system lateral resolution. As a result, the axial resolution was estimated to be 3 μm.
We also imaged a group of bars from a USAF resolution target (Group 7 element 3-6) along both horizontal and vertical directions and plotted the intensities along the dashed line. The image visibility, defined as ( , where is the intensity, was calculated for each group of bars using the peaks and valleys of the intensity. With a visibility threshold of 0.2, the spatial resolution of the bars was determined to be 2.2 µm along both vertical and horizontal directions, indicating an isotropic resolution. The FOV of our system was measured to be 227 µm x 143 µm when using a 60x microscope objective lens (UPLXAPO60XO, Olympus).

Depth calibration
To calibrate the depth, we utilized a fluorescence bead (C16509, Thermo Fisher) and translated it along the depth axis from -16 μm to 16 μm with a 2 μm step. At each depth, we captured an image and then performed digital refocusing by adjusting the shearing parameter, as described in Methods: Refocusing and extending the depth of field. The goal was to identify the shearing parameter that would bring each image into the sharpest focus, which was determined by maximizing a focus measure (e.g., sum of modified Laplacian) for each pixel in the image [21]. The best focus shearing parameter at each physical depth was then recorded. The resultant shearing parameter to depth curve was fitted with linear models. With this calibration curve, we can digitally refocus a 3D object to a specific depth using the correspondent shearing parameter. To validate the accuracy of our shearing parameters, we imaged 3D fluorescence beads in an agarose gel and compared LIFT refocusing against the ground-truth depth images captured by a reference camera, as shown in Fig. 3.

Camera registration
To register the LIFT-FLIM and -sFLIM image with the reference camera, we imaged a 7 by 13 grid pattern. We then extracted the point locations from the reconstructed LIFT-FLIM and -sFLIM images and the reference image and calculated the homography matrix to establish a pixel-to-pixel correspondence between the two cameras. The reprojection error using the homography is less than one pixel, ensuring an accurate registration. This homography matrix is then used to register the LIFT-FLIM and -sFLIM images with reference images for deep learning reconstruction.

Preparation of training data for deep learning
To generate the ground truth depth image stack, we translated the sample stage along the depth axis while capturing images using the reference camera. The resulting image stack was then transformed to the LIFT-FLIM and -sFLIM camera coordinates using a homography matrix obtained through camera registration. We subtracted a background image from the warped image stack.
The LIFT-FLIM and -sFLIM images were captured at depth = 0, and a depth image stack was computed by numerical refocusing. We calculated a uniform DPM at each depth and appended them to the reconstructed image stack. We created a mask by setting a threshold to the reconstructed image to identify regions with sample fluorescence signals above the background. This mask was then applied to both the ground truth image stack and the LIFT-FLIM and -sFLIM image stack. For training, we constructed a total of 200 image stack pairs per task in the training dataset, each comprising an input image collection (LIFT-FLIM/sFLIM reconstruction stack, DPM stack, and a wide-field image at depth=0) and a ground-truth image stack.

Deep learning network architecture and training
The PixelCNN++ architecture [33] was adopted for LIFT-FLIM and -sFLIM refocusing. As illustrated in Fig. 2, our model consists of the down-and up-sampling streams and the lower down-and up-sampling streams. Each stream has 5 ResNet blocks [68] in both the down-sampling and up-sampling paths. Each ResNet block contains 4 ResNet layers, and each ResNet layer has two 3×3 convolutional layers and one 1×1 convolutional layer. The ResNet layer utilized two activation functions 1 , 2 defined below: Here ⨁ means concatenation along the channel axis and ⨀ is element-wise multiplication. Exponential Linear Unit (ELU) [69] and Sigmoid function are defined as where is a hyperparameter that controls the value to which an ELU saturates for negative net inputs. Strided convolutional layers were added between two sequential ResNet blocks to halve the spatial dimensions in the down-sampling path, and conversely transposed strided convolutional layers were utilized to implement up-sampling in the upsampling path. Skip connections connect each ResNet block in the down-sampling path with its counterpart block in the up-sampling path such that relatively higher-frequency image features can flow through the model. The training loss of our model is a linear combination of Fourier domain mean absolute error (FDMAE) [70,71], mean square error (MSE) and the perceptual loss: Here , and are weights of each loss term, and were empirically set as 0.1, 0.1 and 1.0, respectively. ,̂ ∈ 2 are the vectorized ground truth and predicted images, respectively. The FDMAE loss is defined as where ∈ 2 is the Fourier transform matrix. The MSE loss is defined as The perceptual loss is defined as the sum of MSE losses between the feature maps of and ̂ generated by a Visual Geometry Group 16 (VGG16) network [72]: where (•) represent the feature map of the input image after the ℎ block of VGG16, and is the weight for the corresponding feature maps. In this work we used the first three blocks of VGG16 for image feature extraction, i.e., = 3, and empirically set 1 = 0.5, 2 = 0.15, 3 = 0.1. An Adam optimizer with exponentially decaying learning rate was utilized for parameter optimization. The initial learning rate was set as 10 -4 and the decay rate was 0.999995 per epoch.
Our models were implemented using PyTorch framework [73] on a machine with Intel Xeon W-2195 processor and four RTX 2080Ti graphic cards. All models converged after around 5000 epochs, which took approximately 2 to 3 days.

Image stitching
We used a feature-based image stitching algorithm to create a panorama view of the human lung cancer pathology slide from multiple scanned FOVs with overlapping regions. This process involved detecting and matching image features, estimating the geometric transformation between images, and computing the transformation mapped each image onto the panorama. Moreover, to correct the artifacts in the stitched image caused by connecting the individual images, we applied an intensity averaging technique to the neighboring pixels at the artifact's coordinates.

Phasor Analysis
To facilitate fast and accurate analysis of LIFT-FLIM and LIFT-sFLIM data, we utilized a phasor approach to unmix the underlying chromophores. In our experiments involving mouse kidney tissue sections and human lung cancer pathology slides, we reconstructed a multidimensional array at each depth , where , and are the spatial dimensions, and represents fluorescence decay time. After phasor transformation, we fed the resulting phasor coordinates into an unsupervised unmixing algorithm [36,37] to determine the probability of each pixel belonging to a specific cluster. It is important to note that the number of clusters present was assumed to be known a prior, which is generally the case as we label our samples with fluorescence probes or make assumptions regarding the sample composition [37]. For instance, we modeled our data using two clusters for the mouse kidney tissue sections (Alexa Fluor 488 WGA and Alexa Fluor 568 phalloidin) and human lung cancer pathology slide (normal and tumor). Using the probabilities obtained from the unmixing algorithm, we assigned colors to the unmixed image pixels: each cluster was assigned a unique color, and the RGB coordinates of the colors were combined using the probabilities as coefficients. This process results in a color code for each pixel, as shown in Fig. 4f.and Fig. 5i.
In LIFT-sFLIM, the image is represented by a multidimensional array at each depth , where denotes the spectral dimension. To simplify the analysis, we focus on the × matrix ( , ) at one pixel location, where fluorescence decay is sampled at m points ( = 1 , … , ), and the spectrum is sampled at n points ( = 1 , … , ). We also consider wavelength-integrated time decay Λ ( ) = ∑ ( , )  Fig. 6a, the optimal number of clusters to model the data was set to be two and three for temporal and spectral unmixing, respectively. Similar to the experiments conducted on mouse kidney tissue sections and lung cancer pathology slides, we computed the probability estimate of pixels belonging to a particular cluster as Λ ( = , ) and T ( = , , ), such that ∑ Λ = 1 and ∑ T = 1, where , , are indices of classified clusters. Note that the following assumptions are made regarding probability estimates for clusters: Λ for smad3, Λ for sma, collagen and p16, T for sma, T for collagen and smad3, and T for p16. By combining the probability estimate with Eq. 18, we can obtain: SPAD histograms post-processing Processing the raw histograms from SPAD involves three steps: background subtraction, delay correction, and nonlinear correction (Fig. S8a). First, in the background subtraction step, we collected the background signal under the same condition as experiments, which results from the dark count of the SPAD camera and stray light from the environment. The resulting background histogram was then subtracted from the raw histogram. Figure  S8b shows the histograms after subtraction at two representative pixel locations, indicating that the zero references of histograms of the pixels are not aligned due to the delays in the FPGA from the input to the delay line not being matched [74]. Delay calibration was then performed to correct for the misalignment of histograms. In the delay calibration step, to measure the zero-reference bins of each pixel, we shined a picosecond laser beam onto the pixels and registered the start of the event at each pixel as the zero-reference bin location. Figure S8c presents the histograms at the pixels in Fig. S8b, indicating the zero reference at the maximum bins. The aligned histograms are shown in Fig. S8d after shifting by the zero reference bin values. After the delay correction, a nonlinear correction was conducted to smooth the non-linearities inherent in the delay chains [75]. Using non-time-correlated uniform illumination to the linear SPAD array, we collected a sufficient amount of histograms (~100) and processed the resultant averaged histogram using histogram equalization to create a uniform histogram [75]. A correction matrix was computed and stored for each pixel during the histogram equalization.
Multiplying the correction matrix with raw histograms yields smoothed histograms. Figure  S8e shows the raw histogram and smoothed histogram for a 2.5 ns period under uniform illumination, and Fig. S8f shows the smoothed histogram after multiplication with the correction matrix.
Ground truth lung organoid imaging using a confocal fluorescence microscope To show the ground truth locations of individual biomarkers in our lung organoid experiment, we cultured another four sets of lung organoids under the same condition and labeled them with individual fluorophores. We then imaged the organoids using a standard confocal fluorescence microscope (Zeiss LSM 880 Confocal). Figure S9 presents the color-coded images, which display a similar appearance to our LIFT-sFLIM results.

Sample preparation Mixed fluorescent beads
To create a mixed fluorescent bead sample, we embedded three types of fluorescent beads (F8858, C16509, F8831, Thermo Fisher) into agarose gels. First, we diluted the bead suspensions and sonicated them. Then, we pipetted 10uL of the 4um bead suspension (∼5.7×10 7 beads/mL), 10μL of the 6μm bead suspension (∼1.7×10 7 beads/mL), and 10uL of the 10μm bead suspension (∼3.6×10 6 beads/mL) into 10 mL of PBS (10010023, Thermo Fisher) for each type of bead. We mixed 100 µL of the diluted 4 µm bead solution, 100 µL of the diluted 6 µm bead solution, and 100 µL of the diluted 10 µm bead solution to create the final mixed beads solution, which contained approximately 1.9×10 4 4 µm beads/mL, 5.6×10 3 6 µm beads/mL, and 1.2×10 3 10 µm beads/mL. We prepared the agarose gel by making a 1% [weight/volume] solution of low melting point agarose (A6013, Sigma-Aldrich) in PBS, heating it until it completely dissolved, and cooling it down to approximately 40°C. We added 2.5 µL of the mixed beads solution to 400 µL of the agarose solution. After sonication, we added a 50 µL drop of the mixture onto a glass bottom dish (P35G-1.5-14-C, Mattek) and allowed it to solidify for a few minutes. Finally, we imaged the ~1 mm thick gel, which contained immobilized fluorescent beads, using the methods described in the main text.

Distal lung organoid preparation
We used a hydrostatic droplet generator to fabricate alginate microbead scaffolds with an average diameter of 100 µm, which mimics the size of pulmonary alveoli. After generating the microbeads, we coated them with collagen I (354249, Corning) and dopamine (H8502, Sigma) in a two-step process to functionalize them for cell culture. The detailed protocol for alginate bead generation and functionalization can be found in [57].
Human primary adult normal lung fibroblasts were isolated from distal lung tissue from a de-identified healthy donor (65-year-old, male, Caucasian, non-smoker, nonalcoholic) procured from the International Institute for the Advancement of Medicine (IIAM). Human lung tissue was procured under the UCLA-approved IRB protocol #16-000742. The fibroblast (crawled out population) and epithelial (MACS sorted EpCAM + population) were isolated from the distal tissue and used in this study.
To develop the 3D model, we used a high aspect ratio vessel (HARV) bioreactor vessel (model: RCCS-4H; Synthecon, Houston, Texas) of 2 mL volume and added 0.5 mL of functionalized microbeads and 1.5 mL of media containing a total of 1 million cells (epithelial:fibroblast=1:1). The vessel was screwed into the bioreactor base and rotated for 48h to allow optimum cell-bead adherence. After 48h, the cell-coated bead solution was aliquoted 100µL per well in a glass-bottom 96-well plate (P96-1.5H-N, Cellvis) and the plate was briefly centrifuged (1000g X 2 min) to settle the cells/ beads at the bottom of the plate. An additional 150µL media was added to each well. The plate was then kept inside an incubator (37°C,5%CO2,95%RH) and monitored for the formation of selforganized 3D structures. Within the next 72h, the fully-formed 3D co-culture organoids with micro-alveolar structures were observed in each well.

Data availability
The data of this study is available at https://github.com/iOpticsLab/LIFT-FLIM

Code availability
The data of this study is available at https://github.com/iOpticsLab/LIFT-FLIM contributed to the preparation of lung cancer slides and provided ground truth diagnosis. L.G. conceived and supervised the project. All authors analyzed the data and contributed to writing the manuscript.

Corresponding authors
Correspondence to Liang Gao.

Ethics declarations Competing interests
We would like to make a disclosure that Edoardo Charbon holds the position of Chief Scientific Officer at Fastree3D, a company that specializes in the manufacturing of LiDARs for the automotive market. Claudio Bruschini and Edoardo Charbon are also cofounders of Pi Imaging Technology. Additionally, Liang Gao has a financial interest in Lift Photonics, which commercializes the LIFT technology for FLIM applications. However, it is important to note that none of these companies were involved in the research presented in this paper.    The network consists of two down-and up-sampling streams. Each stream has five ResNet blocks in both down-sampling and up-sampling paths. Each ResNet block contains four ResNet layers, and each ResNet layer has two 3×3 convolutional layers and one 1×1 convolutional layer, as indicated in the bottom right panel. Strided convolutional layers were added between the two adjacent ResNet blocks to halve the spatial dimensions in the down-sampling path, and conversely transposed strided convolutional layers were utilized to implement up-sampling in the up-sampling path. The spatial dimensions of the ResNet blocks in the sampling streams from left to right are 256×256, 128×128, 64×64, 32×32, 16×16, 32×32, 64×64, 128×128, 256×256. The central 16×16 ResNet blocks are shared by the down-and up-sampling streams. Skip connections connect each ResNet block in the down-sampling path with its counterpart block in the up-sampling path. The inputs to the network include LIFT refocused depth image stack using filtered back projection from depth − 0 to depth 0 , reference image captured at depth zero, and a DPM stack. The output is a high-resolution image stack at the corresponding depths. DPM: digital propagation matrix. 1 , 2 : activation functions. Conv2d: convolution 2D.