**Configuration of Experimental TFMPEM.** The detailed configuration of the DMD-based TFMPEM employed in the present study is described in a previous study by the present group.30 However, briefly, the DMD (DLP7000, Texas Instrument, USA) had a 0.7-inch illumination area and a 1,024 × 768 micromirror array with a pitch size of 13.68 µm and was orientated at an angle of 45° with respect to the incoming light. The structure of the DMD was equivalent to a blazed grating with a blazed angle of 12°. The 10th -order diffraction of the ultrafast laser, which is a Ti:Sapphire regenerative amplifier (Spitfire Pro XP, Spectra-Physics, USA) coupled with a Ti:Sapphire ultrafast oscillator (Tsunami, Spectra-Physics, USA) as seed beam, was adopted in order to approach the maximum power set through the diffraction equation. The dispersion efficiency of the DMD was verified as being equivalent to a 517 lines/mm blazed grating when using the 1st -order diffraction of the same wavelength. Through the use of a high definition multimedia interface (HDMI), the DMD served not only as a dispersion element for temporal focusing, but also a spatially-modulated illuminator with the ability to produce 8-bit patterns.

**Theoretical simulation of patterned-illumination TFMPEM.** The simulations conducted in the present study were based on an advanced theoretical model of TFMPEM founded in Fourier optics.39 In particular, the excitation source in TFMPEM can be viewed as the superposition of numerous monochromatic waves. In the time domain, the intensity distribution of the ultrashort pulse, *I**G*, on the DMD is related to the inverse Fourier transform of the complex amplitude of its spectral profile, *U**G*, i.e.,

$${I}_{G}\left({x}_{G},{y}_{G},t\right)={\left|\frac{1}{2\pi }\int {U}_{G}\left({x}_{G},{y}_{G},\omega \right){e}^{j\omega t}d\omega \right|}^{2}$$

,

where *ω* is the angular frequency of the laser beam. When the pulse is reflected from the grating, the superposed frequencies are separated into different diffraction angles, *θ**ω*, in accordance with the diffraction equation, i.e.,

$$\frac{\omega }{c}sin {\theta }_{\omega }=\frac{\omega }{c}sin {\theta }_{i}\pm m\frac{2\pi }{{\Lambda }}$$

,

where *θ**i* is the incident angle; *c* is the speed of light in vacuum; and *m* and Λ are the diffraction order and period of the DMD, respectively. The diffracted frequencies are focused independently at different points on the back focal plane of the objective lens and form a spectral line. The complex amplitude distribution of the frequency, *U**BF*, which is restricted by the circular aperture of the objective lens, can be calculated in the spatial domain via Fourier transformation as

$${U}_{BF}\left(u,v,\omega \right)=\left[\iint {U}_{G}\left({x}_{G},{y}_{G},\omega \right){e}^{-j\frac{\omega }{c}\left(\frac{u-{f}_{1}tan {\theta }_{\omega } }{{f}_{1}}{x}_{G}+\frac{v}{{f}_{1}}{y}_{G}\right)}d{x}_{G}d{y}_{G}\right]circ\left(\frac{\sqrt{{u}^{2}+{v}^{2}}}{D/2}\right)$$

,

where *f**1* and *D* are the focal length of the collimating lens and effective diameter of the back aperture of the objective lens, respectively. According to Fourier optics, the angular spectrum at the front focal plane of the objective lens, *A**TF*, which is also defined as temporal focusing plane, is equal to *U**BF*. However, this methodical approach in the spatial domain yields only an estimate of the complex amplitude of the spectral profile in the temporal focusing plane. Thus, in order to obtain a more accurate evaluation of the angular spectrum at the defocusing plane, *A**dTF*, the present study utilizes the Helmholtz equation to evaluate the propagation over a distance Δ*z* (i.e., the defocusing length away from the temporal focusing plane), with the relative phase term. In other words, the angular spectrum is modeled as

$${A}_{dTF}\left(u,v,\omega \right)={U}_{BF}\left(u,v,\omega \right){e}^{-j\frac{\omega }{c}\sqrt{{n}^{2}-{\left(\frac{u}{{f}_{2}}\right)}^{2}-{\left(\frac{v}{{f}_{2}}\right)}^{2}}\varDelta z}$$

,

where *n* and *f**2* are the refractive index of the propagating medium and the focal length of the objective lens, respectively. The complex amplitude of the overall spectral profile in the spatial domain can then be obtained using the Fourier transform as

$${U}_{dTF}\left(x,y,\varDelta z,\omega \right)=\iint {A}_{dTF}\left(u,v,\omega \right){e}^{-j\frac{\omega }{c}\left(\frac{x}{{f}_{2}}u+\frac{y}{{f}_{2}}v\right)}dudv$$

.

Finally, the intensity distribution of the temporal focusing region in the time domain, *I**dTF*, is obtained from the time domain inverse Fourier transform of the spectral profile as follow:

$${I}_{dTF}\left(x,y,\varDelta z,t\right)={\left|\frac{1}{2\pi }\int {U}_{dTF}\left(x,y,\varDelta z,\omega \right){e}^{j\omega t}d\omega \right|}^{2}$$

.

In other words, the time domain intensity distribution of the ultrashort pulse in TFMPEM can be calculated based on Fourier transformation in both the time domain and the spatial domain. The time-varying fluorescence intensity under a multiphoton excitation scheme, *I**MPEF*, is proportional to the order of magnitude of the intensity distribution in the temporal focusing region. That is,

$${I}_{MPEF}\left(x,y,\varDelta z,t\right)\propto {\left|{I}_{dTF}\left(x,y,\varDelta z,t\right)\right|}^{M}$$

,

where *M* is the order of magnitude of the *M*-photon excitation process. The AEC of TFMPEM, *I**AEC*, consists of the distribution of the overall, time-averaged fluorescence at different defocusing lengths, and can be estimated via the spatiotemporal integration of the time domain intensity distribution in the temporal focusing region as follow:

$${I}_{AEC}\left(\varDelta z\right)=\iiint {I}_{MPEF}\left(x,y,\varDelta z,t\right)dxdydt$$

.

Hence, the theoretical analysis model enables the computation of both the complex amplitude distributions at the back focal plane and the temporal focusing region, respectively, and the AEC of the structured lighting pattern.

**Hilbert-Huang transform.** The HT utilizes two patterned images with the same spatial frequency but a 180° difference in phase. The background noise can thus be eliminated through subtraction. The HT reconstruction process introduces a phase shift of 90°. Therefore, the subtracted image and its HT can be superimposed to create an image without background noise. However, for high spatial frequencies of the structured light produced by the DMD, the detection signal has a square-like characteristic, which results in the formation of pattern residuals in the reconstructed image. Thus, the present study adopts the HHT to process the detection signal. In particular, before demodulating the signal, the input is decomposed into multiple IMFs by enhanced fast empirical mode decomposition.38 A sifting process is then performed to remove the low-frequency components of the detection signal, which correspond to background noise and pattern residuals. Finally, the traditional HT algorithm is employed to reconstruct the image.