Navigator-Free Submillimeter Diffusion Imaging using Multishot-encoded Simultaneous Multi-slice (MUSIUM)

The ability to achieve submillimter isotropic resolution diffusion MR imaging (dMRI) is critically important to study fine-scale brain structures, particularly in the cortex. One of the major challenges in performing submillimeter dMRI is the inherently low signal-to-noise ratio (SNR). While approaches capable of mitigating the low SNR in high resolution dMRI have been proposed, namely simultaneous multi-slab (SMSlab) and generalized slice dithered enhanced resolution with simultaneous multislice (gSlider-SMS), limitations are associated with these approaches. The SMSlab sequences suffer from the slab boundary artifacts and require additional navigators for phase estimation. On the other hand, gSlider sequences require relatively high RF power and peak amplitude, which increase the SAR and complicate the RF excitation. In this work, we developed a navigator-free multishot-encoded simultaneous multi-slice (MUSIUM) imaging approach on a 3T MR scanner, achieving enhanced SNR, low RF power and peak amplitude, and being free from slab boundary artifacts. The dMRI with ultrahigh resolution (0.86 mm isotropic resolution), whole brain coverage and ~12.5 minute acquisition time were achieved, revealing detailed structures at cortical and white matter areas. The simulated and in vivo results also demonstrated that the MUSIUM imaging was minimally affected by the motion. Taken together, the MUSIUM imaging is a promising approach to achieve submillimeter diffusion imaging on 3T MR scanners within clinically feasible scan time.


INTRODUCTION
Diffusion MRI (dMRI) is a noninvasive imaging modality that enables characterization of tissue microstructure as well as structural connectivity of the human brain (Basser et al., 1994). While massive diffusion datasets have been collected with a spatial resolution of 1.25-mm isotropic in the Human Connectome Project (Sotiropoulos et al., 2013), the ability to acquire ultra-high isotropic resolution diffusion images is critically important to study fine-scale brain structures and empowers the assessments of early brain development. However, the major impediments enabling submillimeter dMRI include long acquisition time and inherently low signal-to-noise ratio (SNR). Over the past decade, the simultaneous multislice (SMS) imaging with blippedcontrolled aliasing (blipped-CAIPI SMS) has become one of the most widely-employed approaches to shorten acquisition time (Setsompop et al., 2012). While the SMS approach is capable of shortening acquisition time considerably, the low SNR remains an issue hindering our ability to achieve submillimeter dMRI.
Increasing both excitation volume per shot and number of shots are common strategies to enhance the SNR. Two different approaches have leveraged this concept, namely 3D multislab dMRI and RF-encoded multishot dMRI, respectively. The 3D multislab dMRI excites thick slabs sequentially in each TR, and Fourier encoding is performed along the slab direction for each slab over multiple TRs (Frank et al., 2010, Engstrom and Skare, 2013, Frost et al., 2015. Combining 3D multislab and SMS imaging, the simultaneous multi-slab (SMSlab) approaches enable rapid acquisition while preserving SNR (Chen and Feinberg, 2013, Frost et al., 2013, Bruce et al., 2017. Nevertheless, the aforementioned approach suffers from a few drawbacks. First, imperfect slab-selective profiles could lead to slab boundary artifacts. Second, the use of Fourier encoding makes it difficult to self-navigate the motion-induced phase variation across shots when the encoding frequency is high. Third, Gibbs artifacts could be present when a small slab encoding number is used. However, higher number of slab-encoding shots increases the slab thickness and therefore shortens the TR per shot, limiting its ability to employ a TR achieving the highest SNR efficiency (Chen and Feinberg, 2013). A number of innovative methods have been proposed to address these issues. For example, the nonlinear inversion for slab profile encoding was proposed to mitigate slab boundary artifacts but at the cost of a high computational load (Van et al., 2015, Wu et al., 2016. Besides, the 2D navigator was used to capture the phase variation (Bruce et al., 2017) of SMSlab. Nevertheless, the insertion of a 2D navigator increases the acquisition time and specific absorption rate (SAR).
In contrast to the 3D multislab approaches, the RF-encoded multishot approach, Generalized Slice Dithered Enhanced Resolution with Simultaneous Multislice (gSlider-SMS) (Setsompop et al., 2017), acquires multiple thin slabs simultaneously. Each thin slab consists of multiple thin slices that are RF-encoded, similar to the Hadamard encoding (Glover and Chang, 2012). Since RF-encoding is used, the gSlider approach offers two major advantages over the 3D mutlislab approaches, including a higher SNR to allow phase self-navigation and free from Gibbs ringing artifacts even if the number of within-slab slices is small. Nevertheless, the gSlider-SMS imposes several technical challenges including elevated SAR and RF peak amplitude. Although the RF peak amplitude can be reduced by variable-rate selective excitation (VERSE) (Hargreaves et al., 2004), it renders the gSlider imaging sensitive to B1 and B0 inhomogeneity. Secondly, in order to minimize slice crosstalk, it requires a long TR (≥ 4s) (Setsompop et al., 2017), potentially compromising SNR efficiency.
To this end, we developed a multishot-encoded simultaneous multi-slice (MUSIUM) imaging approach, capable of achieving submillimeter dMRI on 3T without specialized gradient coils and mitigating the aforementioned limitations. Similar to the gSlider, MUSIUM excites a multiple of SMS factor slices and employs multiple shots to maintain an effective slice acceleration rate. However, differing from the gSlider, the simultaneously-excited slices in the MUSIUM sequences are equally spaced instead of forming multiple slabs, mitigating the need of a high BWTP in order to reduce SAR. To unalias the simultaneously-excited slices, the MUSIUM sequences use Fourier encoding (Zahneisen et al., 2013) rather than RF encoding so that the phase-optimization schemes (Wong, 2012) can be applied and thereby reduce the RF peak amplitude considerably. Unlike the SMSlab sequences in which most of the Fourier encoding shots are related to high spatial frequency, the encoding shots of MUSIUM imaging correspond to relatively low spatial frequency. As a result, each encoding shot achieves a high SNR, enabling phase self-navigation. Our results showed that the MUSIUM approach achieved a higher SNR when compared to conventional SMS imaging and was capable of acquiring wholebrain dMRI with 0.86-mm isotropic resolution and 66 diffusion directions in ~12.5 minutes. The motion robustness of MUSIUM imaging was also demonstrated.

MUltishot-encoded SImUltaneous Multi-slice (MUSIUM)
The slice excitation and k-space sampling trajectory are illustrated in Figure 1a and 1b respectively. Let n sms denotes the effective SMS factor and n shot denotes the number of Fourier encoding shots. As shown in Figure 1a, n sms ×n shot slices will be excited simultaneously for each shot. The fully sampled k-space of SMS can be viewed as the Fourier encoding of a 3D volume comprising of n sms ×n shot contiguous slabs that are centered around the excited slices (dashed lines ,   Fig 1a) (Zahneisen et al., 2013). Hence, SMS is applicable to the existing 3D encoding. The fully-sampled k-space points of Figure 1a are denoted by the orange circles in Figure 1b.
Differing from the conventional 3D encoding, the SMS 3D encoding is virtually free from Gibbs ringing effect (Czervionke et al., 1988) regardless of the number of partition encoding (PAE). This is because the Gibbs artifact is associated with the truncation of Fourier series that represent continuous signal in the image domain. For the SMS 3D encoding, however, the image signal along the slice direction is discrete instead of continuous. Therefore, the Fourier transform of the image signal can be approximated as discrete Fourier transform in which Gibbs phenomenon is not applicable.
To enhance SNR and reduce geometry factor (g-factor) penalty, the SMS excitation will be repeated by n shot times with different k-space sampling trajectories across shots as shown in where d PAE is FOV z /n sms . The blip moment of y gradient (G y ) is n ipat /FOV y where n ipat denotes the in-plane acceleration rate and FOV y denotes the width of FOV along y direction. Different from the blip moment of G y , the blip moment of G z depends on the sampling location on the k y -k z plane. Let i trj = 0, 1, 2 … denotes the index of the sampled k x lines in a trajectory, k z (i trj ) denotes the k z position at the i trj -th sample, Δk z (i trj ) denotes the blip moment of G z , b z (i trj ) denotes the sign of the Δk z (i trj ) and p z (i trj ) denotes the index of k-space band. The shot index i shot = -n shot /2 , 0, 1, … (n shot  n shot /2 1) where • denotes the floor function. The generation of every sampling trajectory is elaborated with pseudo programming language in Table 1. 1: Setup the initial k z (0) = (i shot + 0.25)/d PAE if i shot is even, and (i shot -0.25)/d PAE if i shot is odd. 2. Setup the initial b z (0) = -1 if i shot is even, and b z (0) = 1 if i shot is odd. 3. Setup the initial p z (0) = i shot + n shot /2 4. Set i trj = 0 5. Repeat 6: if p z (i trj ) + b z (i trj ) < 0 Δk z (i trj ) = -0.5×b z (i trj )/d PAE , p z (i trj +1)=0 and b z (i trj +1)=1 else if p z (i trj ) + b z (i trj ) > (n shot 1) Δk z (i trj ) = -0.5×b z (i trj )/d PAE , p z (i trj +1) = n shot 1 and b z (i trj +1) = 1 else Δk z (i trj ) = b z (i trj )/d PAE , p z (i trj +1) = p z (i trj )+b z (i trj ) and b z (i trj +1) = b z (i trj ) 7: k z (i trj +1) = k z (i trj ) + Δk z (i trj ) 8: set i trj = i trj +1 9: Until i trj reach the end of sampling trajectory Table 1: The procedure for generating the blip moment of G z along the sampling trajectory.

MATERIALS AND METHODS
All experiments were performed using a 3T Prisma MR scanner (Siemens Healthcare, Erlangen, Germany). Data was acquired from three healthy male participants after obtaining informed consent using an institutionally approved protocol.

Simultaneous Multislice Excitation
The RF excitation and refocusing pulses were designed using the Shinnar-Le Roux (SLR) algorithm (Pauly et al., 1991). The limits of the ripple in the passband and stopband were set as 1%. To perform simultaneous multislice excitation, slice-selective RF pulses were frequency modulated and summed. In total, n sms × n shot slices were excited simultaneously. An optimized phase term for each of the summed RF pulses was introduced to reduce the peak RF amplitude by approximately square root of (n sms × n shot ) (Wong, 2012). The BWTP of excitation and refocusing pulses were 4 and 5.4, respectively. The pulse durations were 3.2 ms and 6.4 ms.

Bloch Simulation of RF Load and Slice Crosstalk
To compare between gSlider and our newly developed approach, the slice crosstalk, RF power and peak RF amplitude were obtained by simulating signal progression during spin-echo acquisition using Bloch simulation. The RF pulses of both gSlider and MUSIUM sequences were generated using SLR algorithm. T1 was set as 1400 ms (Lu et al., 2005, Stanisz et al., 2005, Wright et al., 2008 and TE as 80 ms. The T2 and T2* effects were ignored for simplicity. The pulse durations for excitation and refocusing were set as 6.4 and 8.0 ms, respectively. For the gSlider sequence, the BWTPs of excitation and refocusing pulses were 12 and 8, respectively as described in (Setsompop et al., 2017). For MUSIUM sequence, the BWTPs of excitation and refocusing pulses were 4.0 and 5.4, respectively.
The metrics of RF loading included the RF power and RF peak amplitude. The RF power was defined as the power integral of excitation and refocusing RF pulses divided by TR and the RF peak amplitude was the maximal magnitude of RF pulses across time. The slice cross talk measured the signal leakage from one of the simultaneously-acquired slices to the others, which was derived through the forward encoding matrix A. For gSlider imaging, the A contains the RFencoding information obtained from the Bloch simulation. The individual slice information can be reconstructed by where z, λ, and A inv and b denote the vector of reconstructed slices, Tikhonov regularization parameter, the inverse matrix of A and the vector of encoded data across shots, respectively. The λ was set as 0.4 as that in (Setsompop et al., 2017). The point spread function of the reconstruction can be characterized by impulse response analysis where r i denotes the vector of impulse response corresponding to slice i and δ i is the impulse vector. On the other hand, for MUSIUM imaging, the A is the approximately the Fourier encoding matrix and the A inv in (2) is the inverse Fourier matrix. Let r ij denotes the elements in r i .
The slice cross talk at slice i can be calculated as (3)

Data Acquisition and Reconstruction
Imaging studies were conducted to 1) compare the SNR between conventional SMS and MUSIUM sequences, and 2) to demonstrate the efficacy of ultrahigh resolution whole-brain dMRI. The SMS diffusion images were acquired using the following imaging parameters: matrix dimension=192×180×126 (R-L×H-F×A-P) mm 3 ; 1.0 mm isotropic resolution; axial slicing; SMS factor=2; excitation and refocusing RF pulse durations were 3.2 and 6.4 ms, respectively; BWTP was 4.0 and 5.4 for excitation and refocusing pulses, respectively; no in-plane acceleration; partial Fourier 5/8; echo spacing=0.93 ms; TE/TR=83 ms/11.43 s; b=2000 s/mm 2 . The imaging parameters for MUSIUM were similar as those of SMS except for the following modifications: number of shots = 3; TR per shot = 3.81 s and effective TR per volume = 11.43 s.
To demonstrate the efficacy of obtaining ultrahigh resolution dMRI using MUSIUM, the imaging protocols were as follows: matrix dimension=256×208×153 (R-L×H-F×A-P) mm 3 ; 0.86 mm isotropic resolution; axial slicing; SMS factor = 3; number of shots = 3, excitation and refocusing RF pulses durations were 3.2 and 6.4 ms, respectively; BWTP was 4.0 and 5.4 for excitation and refocusing pulses, respectively; no in-plane acceleration; partial Fourier 5/8; echo The data acquired by MUSIUM sequences were reconstructed in two steps, namely phase estimation and SENSE-based reconstruction. Assuming a slowly varying in-plane phase, the phase information can be accurately approximated using low k-space information. Therefore, the images were down-sampled, which in turn provided a better SNR and much shorter reconstruction time. To estimate the shot-to-shot phase variation, the image of each shot was reconstructed. Let y ds , ds , and x ds denote the down-sampled data, down-sampled forward matrix and down-sampled image to be reconstructed, respectively. y ds = [y 1 ds , y 2 ds , … y n shot ds ] T , where [•] T denotes the transpose operation. Let i ′ = i shot + n shot /2 + 1 denotes the shot index.
The i ′ ds = i ′ ds • S ds where S ds and i ′ ds are the down-sampled coil sensitivity profile and downsampled encoding matrix respectively for the i"-th shot, including k-space sampling trajectory, in-plane subsampling (if applicable), and slice summation. The coil sensitivity profile with original resolution was estimated from the central k-space lines of a reference image using ESPIRiT (Uecker et al., 2014). The total number of receiving coil is n ch . The 2D kernel size was 7x7. The x i ′ ds encompasses the spatial information of n sms × n shot slices but y i ′ ds contains only one collapsed slice from n ch receiving coils. The reconstruction of x ds can be formulated as follows: x ds = arg min x ds y ds − E ds • x ds 2 2 + λ reg D • x ds 2 2 , where D denotes the matrix of spatial differencing operation that promotes the in-plane smoothness. The first and second • 2 2 are the data consistency and the regularization terms respectively. λ reg denotes the regularization parameter.
where x ini ds is the initial solution in (8) without the regularization term. In this study, only the central 1/4 k x and 1/4 k y points, i.e. 1/16 k-space, were used for phase estimation.
Although the images x ds are susceptible to undesirable noise amplification due to a high acceleration rate per shot, the shot-to-shot phase inconsistencies, which are expected to be spatially smooth, can be reliably estimated with the denoising operation based on total variation (Rudin et al., 1992).
where TV(•) denotes the total variation operation. The estimated phase images e −jΦ i ′ ds were upsampled by interpolation using the matlab command "interpn" with the "spline" method. With the upsampled phase image e −jΦ , the image to be reconstructed can be expressed as where x mag and e −jΦ i ′ represent the magnitude image that is consistent across shots and the estimated phase image of the i-th shot respectively. The reconstruction of x mag is similar to but slightly different from eq. (7): x mag = arg min x mag y − • x mag 2 2 + λ mag x mag 2 2 , where y denotes the measured data with all the shots and E=[ 1 , 2 , … n shot ] T denotes the forward matrix. The i ′ = i ′ • e jΦ i ′ • S where S denotes the coil sensitivity profile and i ′ denotes the encoding matrix of the i-th shot. The regularization parameter λ mag can be calculated as where x ini is the initial solution in eq. (11) without the regularization term. The eq. (11) was solved by the conjugate gradient method using the MATLAB function "pcg". It can be seen from eq. (11) that the number of unknowns along z direction is n sms ×n shot and the number of equations is n shot . Since the effective slice acceleration remains at n sms , the g-factor penalty of MUSIUM is comparable to that of the conventional SMS approach. Yet, the MUSIUM imaging has the SNR advantage over the SMS approach owing to the use of multi-shot acquisition.

Data Denoising
To further boost the SNR, a denoising method (MPPCA) based on the principal component analysis (PCA) and Marchenko-Pastur distribution was used (Veraart et al., 2016). The MPPCA method applies PCA denoising to overlapping spatial patches and chooses the rank value automatically based on the Marchenko-Pastur distribution. The patch size used in this study was 5x5x5. The MPPCA method was implemented using an in-house MATLAB code.

Motion
The proposed MUSIUM sequences capable of acquiring a 1-mm or submillimeter diffusion volume in ~10 s. Nevertheless, significant head motion may still occur within 10 sec in some particular groups, such as pediatric population, Parkinson"s Disease patients, and so forth. Hence, it is important to assess the effects of such motion of acquired DTI images. We used a reference image with 1-mm isotropic resolution as the ground truth and tested the following two motion cases: i) linear translation of 1 mm in y, z axes and linear rotation of 1 in x, and ii) linear translation of 2 mm in y, z axes and linear rotation of 2 in x during the MUSIUM encoding acquisition. The x, y, z axes represent the readout, phase encoding, and slice directions. For axial slicing, the motion pattern is up-center-down as in "yes". The motion effects were simulated and the reconstruction was conducted as the proposed reconstruction framework without the knowledge of motion. The motion effects were also assessed in vivo. The two motion cases as used in the simulation were tested. The subject was prompted to move the head continuously center-up-down-center as in "yes" at 12-s interval during the scan. To facilitate the subject to produce small and large head rotations, three equally-spaced crosshairs were projected at the center, below the center and above the center of the screen. The participant was instructed to stare at the crosshair below the center during the "stay still" run. In the "small motion" run, the subject moved the head by changing the eye fixation between the crosshairs at and below the center of the screen. Likewise, the eye fixation was changed between the crosshairs below and above the center of the screen in the "large motion" run.

Comparisons of RF Load and Slice Crosstalk by Bloch Simulation
Using the same SMS factor and number of encoding shots, the RF-encoded approach as used in gSlider sequences and the Fourier-encoded approach as used in MUSIUM sequences excited an identical number of slices but in different ways (Figure 2a). For a fair comparison, none of the sequences used VERSE in the Bloch simulation. Figure 2b shows the comparisons of RF power, peak RF amplitude, G z strength and slice cross-talk between RF-encoded and Fourier-encoded approaches, respectively. The RF power (∝SAR) and peak amplitude of RF-encoded approach is 13.4 and 10.8 times higher than that of Fourier-encoded approach, respectively. The maximally achievable RF amplitude depends on the specific RF hardware. If the RF amplitude exceeds the hardware limit, the delivered RF pulse could be clipped and the excitation profile could be erroneous. The required slice gradient strength of RF-encoded approach is 2.9 times higher than that of Fourier-encoded approach. The slice crosstalk of RF-encoded approach increases as the TR decreases while that of Fourier-encoded approach remains constantly small regardless of the TR. The smallest slice crosstalk of RF-encoded approach is 12.3 times higher than that of Fourier-encoded approach.

SNR Enhancement by Multi-shot Diffusion Imaging
The SNR comparison between single-shot SMS and multi-shot MUSIUM acquisitions is demonstrated in Figure 3. The number of shots in MUSIUM sequences is 3. Figure 3a showed the SNR maps of SMS and MUSIUM and the SNR ratio of MUSIUM to SMS is shown in Figure 3b. In general, the SNR is enhanced by the MUSIUM sequences when compared to the SMS. The averaged SNR of MUSIUM imaging is 56.2% higher than that of SMS imaging.  Using the MPPCA denoising method, the signal quality can be further improved. Figure 5 demonstrated the denoised color-FA maps of three different subjects. The three zoom-in boxes highlighted by the red rectangles are centered around a similar location and the enlarged tensor plots are shown at the bottom row. In general, the tensor orientations are spatially stable and consistent across subjects.
The results demonstrate the ability of MUSIUM imaging to detect the fine-scale structures with the data acquisition time of ~12.5 minutes.

Motion
The effects of the motion along the MUSIUM encoding shots are shown in Figure 6. The upper row shows three different motion conditions. The middle and bottom rows show the simulated and in vivo results, respectively. The reconstructed images without motion are shown in Figure   6a serving as the reference. The linear translation of 1 mm and linear rotation of 1 during MUSIUM encoding lead to slight stripping artifacts in the reconstructed images as shown in Figure 6b. The artifacts in Figure 6c become more pronounced when the translation and rotation increase by two-fold. The stripping artifacts are also observed in the in vivo data and become more pronounced as the motion increased. Nevertheless, the general features of images are preserved despite the presence of such motions.

DISCUSSION
A novel approach capable of acquiring ultra-high isotropic resolution, whole brain DTI images within a clinically acceptable acquisition time was proposed. The proposed MUSIUM imaging approach demonstrated several major advantages over existing approaches. A large number of slices were excited simultaneously in the MUSIUM sequences and multi-shot acquisitions were employed to enhance the SNR and reduce the g-factor penalty. The k-space sampling trajectories were designed in a way that the SNR of each shot is sufficient for phase estimation. Comparing to gSlider with an identical number of simultaneously-excited thin slices, the MUSIUM imaging showed a lower RF load, gradient strength and slice crosstalk. That is, MUSIUM imaging can be implemented on MR scanners without increasing the RF pulse duration, dedicated high performance gradient coils and the use of complicated RF excitation schemes such as VERSE. Compared to the single-shot SMS acquisition, in vivo MUSIUM images showed higher SNR efficiency. The efficacy of submillimeter MUSIUM imaging was demonstrated successfully at 0.86 mm isotropic resolution, revealing detailed structures at cortical and white matter areas.

Acquisitions and reconstructions
The k-space sampling trajectories of MUSIUM travel between -k z and k z with generally a larger blip size than conventional SMS, which may affect the echo spacing and TE. In the case of n sms =3 and n shot =3, the maximal blip moment in MUSIUM sequences doubles the blip moment of the blip-CAIPI SMS sequences with FOV/2 shift. Nevertheless, such a blip moment leads to a negligible increase of TE. Moreover, even if a shorter echo spacing can be achieved in SMS imaging, it might not be applicable to human imaging due to potential peripheral nerve simulation. Therefore, the influences of the MUSIUM sampling trajectories on the TE and echo spacing are minimal.
The selection of reconstruction approach is commonly a trade-off between signal quality, robustness and computational load. This study employed the SENSE-based method and the two- Other reconstruction approaches such as low-rank matrix completion , Mani et al., 2017, Mani et al., 2020, SNR-enhancing joint reconstruction (Haldar et al., 2013), nonlinear phase estimation (Haldar et al., 2020, Ramos-Llorden et al., 2020 have been proposed to improve signal quality at the costs of a high computational load. Nevertheless, the low-rank approaches reported the successful reconstruction on either in-plane acceleration data or throughplane acceleration data with low SMS factor (Mani et al., 2017, Mani et al., 2020. We also examined the reconstruction quality using the low-rank reconstruction approach on throughplane acceleration data with a high SMS factor ( Figure S1). The results shown in Figure S2 suggested that the low-rank reconstructions of MUSIUM data are prone to introduce stripping artifacts in coronal/sagittal views for an axial acquisition and sensitive to the choices of regularization parameters. Nevertheless, other low-rank-based reconstruction, such as locally low-rank regularization (Hu et al., 2019), might improve the reconstruction quality over the lowrank approaches evaluated. However, the optimization of the reconstruction method is beyond the scope of our study.
The denoising techniques and multi-shot MR sequences are complementary approaches for SNR enhancement. The denoising techniques may fail if the SNR is too low as in highresolution diffusion MRI because the covariance matrix of the signal needs to have a sufficiently low rank. Therefore, the enhanced SNR per shot provided by MUSIUM sequences is beneficial to the subsequent denoising process. In this study, we used the MPPCA approach to boost up the SNR on the diffusion analysis. While the MPPCA method effectively suppresses the noise, many recently proposed denoising methods could further improve the signal quality. An extension of the MPPCA method was proposed to apply on the noise models other than additive white Gaussian noise (Cordero-Grande et al., 2019). In addition, joint reconstructions of multiple diffusion-encoding directions were proposed to improve the SNR significantly despite the requirement of lengthy reconstruction time and much more memory (Haldar et al., 2020, Ramos-Llorden et al., 2020. Future work will explore the optimal trade-off between the SNR enhancement and computational load.

Motion
The motion effects during the MUSIUM acquisitions were tested by simulation and in vivo studies. Both simulated and in vivo results showed the motion introduced stripping artifacts, but the general image features were preserved. The robustness of MUSIUM sequences against motion can be expected because the sampling trajectories of each shot travel through the central part of k-space. Future work will perform the retrospective motion correction by using the reconstructed single-shot images for motion estimation and incorporating the motion information into the multi-shot reconstruction. Meanwhile, the markerless prospective motion correction will also be explored (Frost et al., 2019, Berglund et al., 2020.

Limitations
The accuracy of phase estimation is critical for the two-step reconstruction used in this study. The challenges for the phase estimation of MUSIUM imaging are the high acceleration rate per shot. The combined acceleration rate (n sms ×n shot ) per shot in MUSIUM imaging is on the order of 10 if the in-plane acceleration is not employed, which is around the limit of parallelimaging reconstruction capability. However, the combined acceleration rate could easily exceed such a limit if the in-plane acceleration is used. This imposes a major obstacle for MUSIUM imaging to achieve higher resolution. Ongoing efforts in our group have been carried out to tackle this issue.

CONCLUSION
The navigator-free MUSIUM sequences demonstrate enhanced SNR efficiency of dMRI compared to conventional SMS imaging. With the same SMS factor and number of shots, MUSIUM sequences demonstrate lower RF power, RF peak amplitude and slice crosstalk than the RF-encoded multishot approach. The in vivo MUSIUM images demonstrate the efficacy of submillimeter dMRI, revealing fine-scale structures at the cortex. In addition, the sophisticated design of k-space sampling trajectory renders the MUSIUM imaging less sensitive to motion.
Taken together, the proposed MUSIUM imaging is a promising approach to achieve submillimeter diffusion imaging on general-purposed 3T MR scanners within clinically feasible scan time.

ACKNOWLEGEMENTS
We thank Kim-Han Tung and Ye Wu for providing helpful comments. This work was supported in part by NIH grants, R21AG060324 and U01MH110274.

Low-rank Reconstructions
The low-rank matrix completion exploits the redundancy between the neighboring k-space data points. The linear dependency between the k-space points comes from the images with limited spatial support and/or slowly varying phase . The procedure of low-rank reconstruction is illustrated in Figure S1. The reconstructed image is the solution of the equation in Figure S1. The equation is similar to the eq. (11) except that no estimated phase is involved and the solution includes the images of all the shots. The regularization parameter λ is calculate as λ = Y 2 2 / H(m ini ) * , where m ini is the initial solution in eq. (S1) without the regularization term. Figure S1: The illustration of low-rank reconstruction of MUSIUM data. Y denotes the measured data, E denotes the MUSIUM encoding matrix, m denotes the unknown image, λ denotes the regularization paramter and H(•) denotes the Block-Hankel operation. Figure S2a demonstrates the reconstruction results of eq. (11), which is SENSE-based. Figure   S2b and S2c are both the results of low-rank reconstruction but with different regularization parameters. The image resolution is 0.86 mm isotropic and b value is 500. The regularization parameter λ in Figure S2b is calculated by eq. (S1) and the parameter in Figure S2c is 3λ. The SENSE-based images demonstrate less striping artifacts and better image contrast than those of low-rank images. Besides, Figure S2b and S2c demonstrate that the low-rank reconstruction is sensitive to the regularization parameter. The low-rank reconstruction with regularization parameter = λ. (c) The low-rank reconstruction with regularization parameter = 3λ.