Overview of ShinyCardinal Pipeline
ShinyCardinal is a comprehensive and vendor-neutral software tool for MSI data analysis. It is built upon the R package Cardinal 19,20 following the golem schema for production grade app development 22. ShinyCardinal is available as an R package, web application (https://gincpm.shinyapps.io/ShinyCardinal/) and standalone software (https://shinycardinal.sourceforge.io). The workflow covers all the steps necessary for MSI data analysis: data import, pre-processing, data cleaning, image visualization, regions-of-interest analysis, statistical analysis, absolute quantification, image segmentation, network analysis, metabolite identification, and data export (Fig. 1). A series of tutorial videos (https://www.youtube.com/@MSI_WIS/videos) and a built-in user guide are provided to streamline the data analysis. Given the computational complexity of MSI datasets, particularly during data preprocessing, ShinyCardinal allows users to save the preprocessed data in RDS format for future usage to avoid repeated data preprocessing. Notably, in ShinyCardinal each step functions as a module, allowing users to analyze data selectively with preferred modules or collectively and iteratively.
Data import and Preprocessing
The purpose of preprocessing is to reduce experimental variance within the MSI dataset, and prepare it for subsequent statistical analysis 23. ShinyCardinal allows uploading and processing multiple MSI datasets simultaneously. It supports both processed and continuous imzML format 24, in profiled or centroid mode (Fig. 1). The preprocessing steps of ShinyCardinal includes normalization, peak picking, peak alignment, and optionally, spectra smoothing and baseline reduction. It is identical as described for the R package Cardinal 20 except that interactive MS spectra are provided in ShinyCardinal to facilitate users to choose, evaluate and optimize different parameters for each data preprocessing step. The overall data preprocessing time varies from several minutes to few hours depending on the computational resources, MSI data size (e.g., number of mass features and pixels), and notably, the format of the imzML data (i.e., processed and continuous imzML format). Further details illustrating the relationship between data preprocessing time and MSI data size, format, and parallel computation can be found in Supplementary Table 1.
Data cleaning
MSI data is intrinsically complex due to the presence of multiple ‘redundant’ ion species. These include background noises originating from different sources of contaminants, MALDI matrix peaks (unique to MALDI-MSI datasets), and isotopic peaks. This complexity impedes downstream data analysis and leads to false hits in metabolite identification 25,26. To tackle this challenge, ShinyCardinal provides two core modules for MSI data cleaning, namely background noise and matrix peak removal, and deisotoping.
Few software tools have been developed to detect and remove MALDI matrix peaks, such as OffSampleAI 26, rMSIcleanup 27,28 and mass2adduct 29. These methods require either a predefined list of matrix adducts or assume a matrix-like spatial distribution, which limit the untargeted applications of MSI. By contrast, ShinyCardinal is based on the principle that all MALDI matrix-derived peaks show very similar spatial distribution; they can be therefore extracted using a reference matrix peak through colocalization analysis (Fig. 2a). Similarly, different sources of background noises can be detected with reference noise peaks. In a test-case study, matrix removal was performed on a mouse testis sample analyzed in negative ion mode with 1,5-Diaminonaphthalene (DAN) as the matrix. In total, 83 mass peaks were extracted with m/z 313.1458 ([2M-H-H2]−), a known DAN matrix peak, as a reference peak. Among them, 70 were putatively identified as DAN-derived peaks (Fig. 2b-d, Supplementary Table S2), and all the 83 peaks show very similar spatial distribution (Pearson correlation coefficient > = 0.9 Supplementary Fig. 1).
Deisotoping is another challenging problem for MSI data analysis due to the absence of physicochemical separation of ions generated on the sample tissue 30. Currently, three software tools are capable of performing deisotoping for MSI data, massPix 31, imShot 30, and METASPACE (Palmer et al., 2017; Alexandrov et al., 2019). Among them, METASPACE provides the most accurate and specific approach for deisotoping. It takes into account the mass accuracy, spatial distribution similarity, and spectral similarity between a predicted isotope pattern and measured spatial intensities for isotopic peak detection. Nevertheless, the primary aim of deisotoping in METASPACE is to decrease false positives for metabolite identification, and the detected isotope peaks are only displayed for identified mass features. Consequently, the deisotoped MSI data is not available for downstream data analysis. ShinyCardinal uses a similar algorithm except that it does not consider the intensity proportions but instead applies a more stringent spatial distribution similarity score (default is 0.9) for isotopic peak detection (Fig. 3a). The performance of ShinyCardinal was evaluated by manual inspection of the results and comparting them with those obtained by METASPACE from a mouse brain MALDI-MSI data 31. Of all the 698 mass features, 260 peaks were manually identified as isotopic peaks, with 146 isotopic peaks being detected by ShinyCardinal (Supplementary Table S3). Remarkably, the polyethylene glycol (PEG)-1450 polymer was found present within the mouse brain section, and all its corresponding isotopic peaks were accurately detected using ShinyCardinal (Supplementary Fig. S2).
Image visualization
The visualization of ion intensity distributions is the key component of an MSI data analysis 15. ShinyCardinal allows to visualize both ion images and segmentation maps. A range of parameters, such as contrast enhancement methods, smoothing techniques, and color scales, are provided for users to customize and fine-tune the ion images (Fig. 3b). Users can choose to plot either a single ion image or multiple ion images; and multiple images can also be viewed jointly in a superposition manner to compare the spatial localizations of different ions (Fig. 3c). Ion images can be exported separately or collectively in high quality, publication-ready figures in PDF or PNG format. When clicking on the ion image, interactive mass spectra are generated for each selected pixel, allowing for the comparison of ion intensities across different pixels (Fig. 3c).
Region of interest analysis
Region of interest (ROI) analysis holds great potential in identifying differences on a molecular level in small regions of tissue, for which the signal would be easily overlooked when employing non-imaging MS-based techniques, such as liquid chromatography (LC)-MS and gas chromatography (GC)-MS 32,33. The accurate definition of ROIs enables the extraction of molecular abundances specific to each tissue type. This is crucial for statistically discovering molecular alterations either among different ROIs within the same sample (e.g., different tissue types and structural features) or between different samples at the same ROI (e.g., differentially expressed molecules at the same anatomical area between healthy and diseased samples) 34. ShinyCardinal provides an interface for users to manually select ROIs based on either ion images or spatial segmentation maps. Apart from biomarker discovery, ShinyCardinal allows peak profiling and MSI data cropping through ROI analysis (Fig. 4).
The usage and efficacy of ShinyCardinal for ROI analysis was demonstrated on a MALDI MSI dataset collected from a purple tomato fruit section 35,36. The purple anthocyanin-rich tomato fruit was generated by ectopic expression of the snapdragon ROSEA1 (ROS1, a MYB-type) and DELILA (DEL, a bHLH-type) transcription factors. Anthocyanin production in the ROS1/DEL tomato fruit was locally reduced by virus-induced gene-silencing (VIGS), which led to the irregular accumulation of anthocyanins in tomato fruit at the red, ripe stage (Fig. 4a1). To compare the metabolic profiles between anthocyanin-free (A) and anthocyanin-rich (B) regions, two ROIs from each area were selected for statistical analysis and biomarker discovery (Fig. 4a2). The ROIs were defined based on the ion image of m/z 317.0656, which corresponds to the radical ion of the known purple pigment petunidin ([M]+.) in the tomato fruit. Alternatively, a segmentation map, e.g., spatial shrunken centroids segmentation with 9 segments representing different tomato fruit anatomical features (see Image Spatial Segmentation section for more details), can be employed to select ROIs (Fig. 4a2). Furthermore, the defined ROIs can be also visually inspected to verify their accuracy (Fig. 4a3). Subsequently, statistical test was performed for all mass features by comparing ROIs belonging to the two different regions. As a result, a table was generated, containing mean ion intensity for each ROI, fold change, and adjusted p-value for each mass feature (Fig. 4a4, Supplementary Table S4). This table is intended to help users identify potential biomarkers. In this case study, only two groups were considered, with two ROIs defined for each group, all derived from a single sample. However, users can define numerous groups and multiple ROIs originating from multiple samples in their own analyses.
ShinyCardinal also enables intensity profiling along the defined ROI line (Fig. 4b). For example, users can draw a line over the ion image or the segmentation map to capture a series of pixels along the ROI (Fig. 4b2), and subsequently visualize and compare the ion intensities of different mass features against the pixels as a line plot (Fig. 4b3). In addition, ShinyCardinal supports MSI data cropping, a feature akin to the scan scrubber tool described in MSiReader 14. Users can select ROIs (Fig. 4c2) and decide whether to keep or discard these ROIs in the MSI data (Fig. 4c3). This functionality is particularly useful for eliminating unwanted pixels and off-tissue data.
Image Spatial Segmentation
Image spatial segmentation is a powerful tool to explore the characteristics of the MSI data and identify ROIs to understand the tissue structure and metabolic patterns 37,38. The most widely used approaches are hierarchical clustering and K-means. However, these methods treat each pixel independently and overlook the spatial information, which could potentially lead to spatially discrete clusters 39. Cardinal has introduced a unique segmentation method tailed for MSI data, named spatial shrunken centroids (SSC) 40,20. It incorporates the spatially aware distance to regularize the distance between pixel spectra, and thus produces improved segmentations. SSC automatically determines the total number of segments, and it guides the choice of an appropriate number of segments 40.
ShinyCardinal provides full support for SSC with a user-friendly interface. The image segmentation functionality was demonstrated using a bovine lens MALDI MSI dataset 41. To select an appropriate segmentation, ShinyCardinal allows users to initialize SSC with multiple numbers of starting segments k (e.g., 3 to 6), spatial smoothing radii r (e.g., 2, 4, and 6) and the shrinkage parameter s (e.g., 1 to 3) (Fig. 5a). SSC was then run with every permutation of the three parameters, and a segmentation map was generated for each permutation. For instance, in line with the bovine lens anatomy, the dataset was computationally segmented into up to 6 clusters (Fig. 5b). Users can explore each segmentation map by specifying the k, r and s parameters (Fig. 5c). In addition, the shrunken t-statistics of the spectral features were plotted for each cluster within the segmentation map. The interactive plots help users examine and select the most informative mass features that define each cluster, where higher t-statistics values denote increased contributions of the mass feature to that specific cluster (Fig. 5d). Furthermore, a table summarizing the shrunken t-statistics of the spectral features was provided for in-depth investigation (Fig. 5e).
Absolute Quantification for MSI
Quantitative mass spectrometry imaging (qMSI) is an emerging field that allows accurate measurement of local concentrations of molecules within complex samples. qMSI typically relies on using standards of known concentration, which is predominantly achieved through spotting standards on a reference tissue or the use of a mimetic tissue model 42,43. In each approach, several concentrations of a standard (or an isotopically labeled analogue of the target molecule) are used to construct a calibration curve for calculating the absolute concentration of the target analyte within the ROIs.
The quantification module of ShinyCardinal supports both approaches for qMSI. It enables users to extract ion signals from each calibration standard, build a calibration curve using a linear least square regression, and subsequently calculate the concentration of analyte across the selected tissue ROIs (Fig. 6a). ShinyCardinal also allows users to recalculate the results by dynamically updating the calibration curve through the addition or removal of calibration points. When an ion intensity of a designated ROI falls outside the ion intensity range of the calibration curve, the corresponding result for that ROI is highlighted in red in the plot, reminding users that the outcome might not be accurate for such cases. As a case study, we analyzed a published DESI imaging dataset which aimed at spatial quantitation of drugs in a rat liver (Fig. 6b) 44. Using ShinyCardinal, we were able to reproduce the original results of two selected drugs, namely olanzapine (Fig. 6c) and erlotinib (Fig. 6d). The linearity of the response for each drug was satisfactory, with R2 values being 0.995 and 0.998 for olanzapine and erlotinib, respectively (Fig. 6c-d).
Molecular Networking and Database Search Based Metabolite Identification
Metabolite identification remains a significant challenge in MSI due to the inherent limitations such as the absence of chromatographic separation and the complexities associated with implementing tandem MS 45,21. ShinyCardinal offers three key modules, i.e., network analysis, database query, and data export to METASPACE, collectively designed to facilitate metabolite identification.
Molecular networking (MN) is a computational strategy that has been widely used for metabolomics data analysis. It calculates the degree of spectral similarity based on the principle that structurally related molecules tend to yield similar fragmentation patterns (MS/MS). The MS/MS spectra are then organized and presented in graph-based spectral networks, in which each node corresponds to an ion with an associated fragmentation spectrum, and the links among the nodes denote similarities of the spectra 46–48. Analogous or structurally similar molecules are grouped together in the network, enabling the identification of unknown molecules through neighboring known ones. However, given that most of the current MSI studies are performed at MS-1 level, without employing MS/MS fragmentation 6, the application of MS/MS-based MN to MSI datasets becomes unfeasible. A co-localization-based MN, named PICA (pixel intensity correlation analysis), has been proposed for metabolite identification for MSI 36. This approach assumes that ions of similar spatial distribution are also structurally related. Indeed, ions that originate from the same molecule are theoretically perfectly co-localized. For instance, these may include in-source fragments, natural isotope peaks, adduct ions, multiple charged ions, or multimers of the molecule. The efficacy of PICA has been showcased on three MSI datasets, underscoring its efficiency for enhancing metabolite identification.
The network analysis module of ShinyCardinal has enhanced PICA in terms of speed and versatility. It supports both global network analysis and single network analysis (Fig. 7a). Global network analysis calculates the degree of spatial similarity among all mass features in a pairwise manner. It constructs a graph-based network according to the user-defined spatial similarity score cutoff, which ranges from 0 (dissimilar) to 1 (identical in spatial distribution). Global network analysis provides an overview of ion clustering and the number of ion clusters within an MSI dataset. By contrast, single network analysis seeks ions with spatial distributions akin to the user-defined ion of interest. It then builds a network using all these ions and produces a pseudo-MS/MS spectrum. Single network analysis is particularly valuable for metabolite identification.
To showcase the process and effectiveness of ShinyCardinal in metabolite identification, we employed a MALDI data obtained from a mouse brain section 31. The dataset was preprocessed with ShinyCardinal without deisotoping and matrix removal. Out of the 231 detected mass features, a total of 17 ion clusters were generated using a spatial similarity score cutoff of 0.9 (Fig. 7b). Among them, the clusters 11 and 12 were identified as Na+ and K+ polyethylene glycol (PEG)-1450, respectively, each exhibiting a repeating unit of 44 Da (Supplementary Fig. S2). The remaining ion clusters were subjected to single network analysis and the identification module for metabolite annotation. The result of three representative clusters, i.e., cluster 1, 9 and 15 (Fig. 7c), are shown in Fig. 7d. Manual inspection of the resulting pseudo-MS/MS spectra of the 3 clusters confirmed that these ions within each cluster were indeed stemmed from the same molecule. For example, four ions, with m/z values of 772.5267 (ion 1), 773.5292 (ion 2), 713.4532 (ion 3), and 714.4565 (ion 4), were found highly colocalized (cluster 1). Notably, m/z 773.5292 (ion 2) is 13C isotopic peak of m/z 772.5267 (ion 1); and similarly, m/z 714.4565 (ion 4) is the 13C isotopic peak of 713.4532 (ion 3). The MALDI images confirmed that they shared similar spatial distribution (Fig. 7e). Database search under the identification module of ShinyCardinal using HMDB database 49 with a 5 ppm mass accuracy window revealed that m/z 772.5267 (ion 1) corresponded to either a Na+ adduct of a phosphatidylethanolamine (PE) lipid ([M + Na]+, C43H76NO7P, mass accuracy: -1.99 ppm), a K+ adduct of a monomethyl phosphatidylethanolamine (PE-NMe) lipid ([M + K]+, C40H80NO8P, mass accuracy: -1.79 ppm), or a K+ adduct of a phosphatidylcholine (PC) lipid ([M + K]+, C40H80NO8P, mass accuracy: -1.79 ppm). Indeed, accurate mass search alone provides poor evidence for metabolite identification, and it is unable to distinguish lipid class isomers between PC and PE. The detection of m/z 713.4532 (ion 3) in the pseudo-MS/MS of cluster 1 indicates a neutral loss of 59 Da, which corresponds to the PC head trimethylamine, confirming that m/z 772.5267 (ion 1) corresponds a PC class lipid but not a PE or PE-NMe class lipid (Fig. 7d). This example highlights the power of ShinyCardinal in metabolite identification. With the same approach, we have identified all the remaining ion clusters with high confidence (Supplementary Table S5).
To check the accuracy of the metabolite identification results, we have submitted the same dataset (https://metaspace2020.eu/datasets?ds=2023-08-15_15h59m29s) to METASPACE 16,17. The results for the ion clusters are consistent or superior to those from METASPACE (Supplementary Table S5). For instance, m/z 772.5267 (cluster 1) was identified as PE ([M + Na]+), PE-NMe ([M + K]+), or PC ([M + K]+) class lipid at a 10% false discovery rate (FDR) in METASPACE. However, METASPACE does not differentiate between the three lipid class isomers. Using ShinyCardinal, this ion was confidently identified as PC ([M + K]+) class lipid due to the detection of the neutral loss of 59 Da. Another example involves m/z 835.6685 (cluster 8), which was not identified by METASPACE at a 10% false FDR, whereas it was identified as Sphingomyelin (42:2) with two typical neutral losses of 183 and 59 Da (Supplementary Table S5).
The export module of ShinyCardinal allows export of the processed MSI data in centroid imzML format needed for METASPACE or other MSI software tools. In addition, users can choose to export the deisotoped, and background and MALDI matrix peak removed MSI dataset for METASPACE, which further reduce the number of false positives. In addition to RDS and imzML formats, the preprocessed MSI data can also be exported to Comma-Separated Values (CSV) file. The CSV file, structured as a table with m/z values in rows and pixels in columns, can be read into R, Python, or other programming languages for machine learning purposes.