Experimental design and dual-mode SRS microscope
The goal of our study was to verify the possibility of prompt histologic imaging on fresh gastroscopic biopsy using femto-SRS microscope (Fig. 1A). To fully exploit the advantages of femto-SRS (high speed and SNR) and circumvent its inability in chemical selectivity, we constructed a dual-mode SRS microscope, capable of imaging the same FOVs with femto- and pico-SRS (Fig. S1). The optical path of femto-SRS simply uses the transform-limited femtosecond pulses (~120 fs) to generate single-channel SRS images at zero time-delay without chemical resolution. In contrast, spectral-focusing based pico-SRS requires the chirping of femtosecond pulses to ~2-4 picoseconds through dispersive optics, yielding SRS images at different Raman frequencies by varying the inter-pulse time delays (Fig. 1B). Traditional pico-SRS takes raw images at two Raman frequencies (ω1=2845cm-1 for CH2, ω2=2930cm-1 for CH3) to extract lipid/protein distributions and generate histological images. The two SRS modes share most of the optical components except the pulse stretching parts, and could be switched by shutters to image the same FOVs without noticeable sample shifts. Therefore, large datasets of the corresponding femto- and pico-SRS images could be harvested to train the deep neural networks, so that single-shot femto-SRS could be converted to dual-channel pico-SRS without complex optical engineering or physical tuning of detection frequencies. Moreover, imaging speed could be doubled with about half the laser power. We also include second harmonic generation (SHG) of collagen fibers to generate composite multi-color SRS images, as well as false-colored SRH images to reveal histological features and for further analysis (Fig. 1C).
Chemical imaging via U-Net based Femto-SRS
The chemical resolution of SRS essentially results from its spectral resolution to distinguish molecules of different vibrational spectra. Note that femto-SRS integrates the spectral-domain information into a single image, whereas pico-SRS generates hyperspectral images dispersed in the spectral or time domain (Fig. 1B). Therefore, it is the inverse problem of recovering discrete spectral images from the summed image by deep learning algorithms.
We applied a U-shaped fully convolutional network (U-Net) to project single-shot femto-SRS images to dual-channel pico-SRS images with preserved spatio-chemical information. Live HeLa cells and fresh gastric tissues were imaged with both femto- and pico-SRS on the same FOVs. The differences between the two imaging modes could be clearly visualized. For instance, in live cell images pico-SRS revealed high intensity of CH3 (2930 cm-1) and low intensity of CH2 (2845 cm-1) vibrations in the protein-rich cell nucleus, while lipid-rich organelles and droplets showed high intensities at both Raman frequencies (Fig. 2A). These are known from the spectral differences between lipids and protein14. In strong contrast, femto-SRS offered a single image with integrated spectral intensity within the CH stretch window, including the two Raman channels imaged with pico-SRS.
For the training of U-Net, 50 FOVs of HeLa cells and 100 FOVs of gastric tissues were used, with the ratio of training and test data size set to 4:1. After training and optimization, the U-Net was able to convert the raw femto-SRS image to dual-channel images that resembled the pico-SRS at the corresponding Raman frequencies. The high conversion accuracy could be seen by the similarity between the measured pico-SRS (ground truth) and U-Net predicted femto-SRS images of the same Raman frequencies (Fig. 2A). The intensity profiles along the line-cuts through the live cell images further confirmed the accuracy of U-Net conversion, as well as the recovered chemical contrast as highlighted in the nucleus regions (Fig. 2B). The dual-channel SRS images were processed to form colored composite images of cells and tissues, demonstrating almost identical results between pico-SRS and deep learned femto-SRS (Figs. 2C-D and Fig. S2).
Processing of femto-SRH images
After U-Net conversion into dual-channel SRS images, chemical decomposition was applied to yield the images of lipids and protein by simple linear algebra14. Collagen fibers were represented by SHG signal directly. In traditional multi-color SRS imaging, collagen, lipid and protein are false-colored red, green and blue, respectively (Figs. 2c-d).
To simulate the coloring scheme of H&E, we created the pseudo-color of SRS to generate SRH that was more familiar to pathologists18. The contents of lipid, protein and collagen were projected to light pink, dark purple and magenta, respectively, forming SRH images more akin to traditional histopathology (Fig. 1C and Fig. 3). Although the lipid/protein-based contrast of SRH was not identical to traditional H&E, it revealed important pathological features highly similar to H&E.
Femto-SRH reveals key diagnostic features of fresh gastric tissues
Rapid imaging of femto-SRS was achieved on fresh gastroscopic biopsy (~ 2×2 mm2) within 1 minute, followed by real-time U-Net conversion and pseudo coloring to generate multi-color femto-SRS and femto-SRH images (Video S1-S2). All the important normal and neoplastic gastric tissue histoarchitectures were clearly shown, highly consistent with the findings of standard H&E. The results of typical non-cancerous tissues are shown in Figure 3. SRS readily showed the regular arrangement of gastric epithelial cells and surrounding basement membrane in normal glands (Fig. 3A). In adenomas, SRS clearly showed mild to moderate dysplasia of the glandular epithelium, with nuclei locating at the base and maintaining polarity, cigar-shaped features (Fig. 3B). For intestinal metaplasia, in which intestinal-type epithelium replaces normal gastric mucosa, SRS could identify the cup-shaped cells within the metaplastic epithelium, whose secreted mucus was shown as large round compartments with moderate protein content (Fig. 3C, arrows). In addition, SRS also provided a clear view of the extraductal and surrounding mesenchyme. In the case of inflammation, SRS identified infiltration of inflammatory cells within the lamina propria, which was characterized by small, dense, irregularly arranged nuclei (Fig. 3D).
For neoplastic lesions, femto-SRH showed distinct histopathologic features of adenocarcinoma with different degrees of differentiation. In well-differentiated adenocarcinomas, SRS depicted abnormal glands composed of cells with prominent atypia, enlarged nuclei, loss of polarity, and extension toward the luminal surface (Fig. 4A). In moderately differentiated adenocarcinomas, SRS clearly showed disorganized neoplastic cells and distorted ducts, such as irregular, extended, angular, and abnormally fused glandular ducts in tubular adenocarcinomas (Fig. 4B) and elongated finger-like processes lined by columnar or cuboidal cell supported by fibrovascular connective tissue cores in papillary adenocarcinomas (Fig. 4C). As for poorly differentiated adenocarcinoma, SRS showed malignant epithelial cells in a single or nest-like distribution without forming tubular structures (Fig. 4D). In addition, SRS was even able to differentiate signet ring cell components similar to that of H&E staining, showing abundant intracellular mucus pushing the nucleus to one side (Fig. 4E).
Despite the overall high agreement between SRS and H&E, a few discrepancies remain and are worth emphasizing. First of all, SRS maps out the distributions of lipids and protein, whereas H&E mainly stains nucleic acid and protein. Secondly, we demonstrated SRS images on fresh tissues in comparison with H&E staining on thin sections. Although SRS images create thin optical sections of ~ 1 μm thickness due to the nonlinear property of optical signal5,34, they showed slightly different structural features from that of H&E. For instance, the core of the glandular ducts usually appeared hollow in H&E but filled in SRS (Figs. 3A-B and 4A), which is likely due to the loss and dissolution of contents during the sectioning and deparaffinization processes of H&E. Therefore, SRS might reveal more intact and better preserved histoarchitectures of fresh gastric tissues (Fig. S3). Furthermore, label-free SRS is free from the large staining variation of H&E as seen in the different colorings among tissues (Figs. 3 and 4). The more consistent appearance of SRS images may help reduce the uncertainty and improve the accuracy of machine learning based diagnosis.
Deep-learning assisted diagnosis and classification
To reduce the workload of pathologists and provide potentially more objective diagnosis for intraoperative histopathology, we evaluated the performance of deep-learning based diagnosis on femto-SRH images from 279 patients (Table S1). The preferred convolutional neural network (CNN) was Inception-ResNet-V2, composed of an inception module and a ResNet module. The CNN was trained to classify the images into three categories: non-cancer, differentiated cancer and undifferentiated cancer. Labeling of the image data for supervised learning was based on the pathology of each gastric biopsy: normal, adenoma, intestinal metaplasia and gastritis cases were labeled as “non-cancer”; among cancerous tissues, well or moderately differentiated tubular and papillary adenocarcinomas were labeled as “differentiated cancer”; poorly differentiated adenocarcinomas, mucinous adenocarcinoma and signet ring cell carcinomas were labeled as “undifferentiated cancer”35,36. Two neural networks with binary outputs were used for the classification, the first one categorized the dataset into the non-cancerous and cancerous lesions, and the second one further classified the cancer group into subgroups of differentiated and undifferentiated cancer.
The femto-SRH image data set was divided into training/validation set and test set. The training/validation images were sliced into standard tiles of 300 × 300 pixels, and further divided into the training set and validation set with a ratio of 4:1 using 5-fold cross validation. The number of cases and tiles for all the data subsets are shown in Table S2. The training input of each CNN was a patch of tiles, and output the prediction value for each tile (Fig. 5A and Fig. S4). The test images were kept independent from the training process, and the subsequent prediction value was made for individual patient cases based on the classification percentage of the whole SRH image with a threshold of ~0.516,18, as determined by the Youden’s index from the training results (Fig. S5).
For the CNN that classified non-cancer/cancer, the training curves, test results and receiver operating characteristic (ROC) curves demonstrated high performance (Figs. 5B-C and Fig. S5), with the area under the curve (AUC) of 98.8%. Among the 55 test cases, 2 non-cancer cases were mis-classified as cancer (Fig. 5B and Fig. S6). For the network that classified differentiated/undifferentiated cancer, slightly degraded performances are shown (Figs. 5d-e and Fig. S5), with the AUC of 94.2%. And of the 17 test cases, 1 differentiated case was mis-classified as undifferentiated (Fig. 5D and Fig. S6). The overall confusion matrix of the CNN-based femto-SRH in differentiating the three diagnostic categories is shown in Figure 5f. We then quantitatively evaluated the diagnostic results of the CNN-predictions on femto-SRH images and the rating of four professional pathologists on the H&E sections of the corresponding specimens with the ground truth being the results of clinical pathology (Table 1). Our results indicated that CNN-based SRH has near-perfect diagnostic performance in determining cancerous lesions (96.4% accuracy), and a high level of concordance with the true pathology (κ = 0.899). The statistical differences between CNN-SRH and the pathologists were subtle in the diagnosis of cancerous lesions (Table S3). For the subtyping of differentiated and undifferentiated cancers, CNN-SRH demonstrated better performance (94.1% accuracy) than the pathologists (85.3% mean accuracy), and maintained a high diagnostic concordance with true pathology (κ = 0.85). Despite the few errors, our trained CNN models exhibit excellent capability of providing automated and accurate diagnosis on femto-SRH images of fresh gastric tissues.
Semantic segmentation of intratumoral heterogeneity
The highly heterogeneous development of tumor results in the uneven intratumoral histoarchitectures, which offers critical information of tumor malignancy. To further visualize the heterogeneous distribution of tumor subtypes and levels of differentiation, we developed a semantic segmentation algorithm based on the above CNNs, with the operations of flip-expansion, shifting FOVs and scoring (see Methods and Fig. S7). High resolution predictions were made to generate the diagnostic probability maps of non-cancer, non-diagnostic, differentiated and undifferentiated cancers (Fig. 6A). The prediction probabilities were colored and superimposed onto the SRH images, showing the spatial characteristics of the diagnostic histological heterogeneity within a gastroscopic biopsy (Fig. 6B).
To illustrate the performance of the algorithm, we demonstrated the results of a typical tumorous tissue imaged with femto-SRH (Fig. 6B). The red color represents the high probability regions of predicted undifferentiated cancer, with low differentiation, poor maturity and high malignancy. The blue color represents the differentiated tissue regions with well or moderately differentiated carcinoma, and lower degree of malignancy. The green regions are predicted non-cancer, with gastritis symptoms and regularly patterned histology. And the gray areas do not provide useful diagnostic information, mainly consisting of the empty regions. From the segmentation results, we could clearly see that this particular tissue was mainly composed of poorly differentiated tumors with very few sites that remained moderate differentiation, which agreed with the pathological result of the patient. Moreover, representative FOVs of these subtyped areas are magnified (Fig. 6C), confirming the accuracy of the semantic segmentation algorithm in generating diagnoses with high spatial resolution.
Intraoperative evaluation of resection margins for ESD
In addition to the application of deep learned femto-SRH in rapid diagnosis of fresh gastroscopic biopsy, we also demonstrated its potential in evaluating resection margins for endoscopic submucosa dissection (ESD), whose goal was to achieve maximal removal of tumor tissues and preservation of healthy parts. We simulated intraoperative diagnosis on ESD specimens taken at three locations: the tumor core, the visual margin and the normal tissue ~8 mm away from the margin (Fig. 7A). After resection and non-invasive imaging with femto-SRH, the fresh tissues were subsequently sent to the pathology department for standard histopathological examinations as the ground truths.
The femto-SRH image of each tissue was evaluated based on the diagnostic histological features and semantic segmentation using the deep learning models we have developed (Fig. 7B), highlighting the areas predicted as non-cancer (blue) and cancer (red). For the intra-tumor tissue, SRH revealed well-differentiated tubular adenocarcinoma within most of the tissue areas, well-agreed with the traditional pathology. At the tumor margin, a clear boundary between the normal and tumorous tissues could be identified, with differentiated tubular adenocarcinoma in the predicted tumor region. As for the adjacent normal tissue, the overall tissue histoarchitecture appeared non-cancerous with only a small amount of infiltrated cup-shaped cells, which agreed with the diagnosed chronic atrophic gastritis with intestinal metaplasia. A diagnosis accuracy of ~ 93.3% was reached among the 5 studied ESD cases (Fig. 7C). Rapid evaluation of resection margins with femto-SRH may assist the decision-making during ESD and help improve patient care.