Surface Enhanced Raman Scattering (SERS) Spectroscopy Combined With Chemical Imaging Analysis For The Early Detection of Apple Valsa Canker

Background: Apple Valsa Canker (AVC) with early incubation characteristics is a severe apple tree disease. Therefore, early detection of the infected trees is necessary to prevent the rapid development of the disease. Surface enhanced Raman Scattering (SERS) spectroscopy is a promising technique that simplies detection procedures and reduces detection time. Meanwhile, SERS enhance signals at low laser powers and suppress biological uorescence. In this study, the early detection of the AVC disease was carried out by combining SERS spectroscopy with the chemometrics methods and machine learning algorithms, and then chemical distribution imaging was successfully applied to the analysis of disease dynamics. Results: Firstly, the microstructure, UV-Vis spectrum, and Raman spectrum of SERS metallic nano-substrates were proved to investigate the enhancement effects of the synthesized AgNPs. Secondly, the multiple spectral baseline correction (MSBC), the asymmetric least squares (AsLS), and the adaptive iterative reweighted penalized least squares (air-PLS) were adopted to eliminate the disturbances of the baseline offset. The correlation analysis method was employed to identify the best baseline correction algorithm, which was the air-PLS algorithm herein. Meanwhile, principal component analysis (PCA) was used to perform clustering analysis based on the healthy, early disease, and late disease sample datasets, demonstrating obvious clustering effects. After that, optimal spectral variables were selected to build machine learning models to detect AVC disease, incorporating the BP-ANN, ELM, RForest, and LS-SVM algorithms. The accuracy of these models was above 90%, showing excellent discriminant performance. Finally, SERS chemical imaging provided the spatiotemporal dynamic characteristics of changes in the cellulose and lignin of the phloem disease-health junction under AVC stress. The results suggested that cellulose and lignin in the cell walls of infected tissues reduced signicantly. Conclusions: SERS spectroscopy combining with chemical imaging analysis for early detection of the AVC disease was considered feasible and promising. This study provided a practical method for the rapid diagnosis of apple orchard diseases.


Introduction
Apple Valsa Canker (AVC), caused by fungus valsa mali, is a severe apple tree disease resulting in serious economic losses in Southeast Asia and China [1]. Previous studies have shown that the AVC in symptomless tissues was found in more than 50% of apple orchards surveyed by Zang et al. [2]. It caused a signi cant reduction in the fruit yield. Commonly, AVC is mainly characterized by canker, infected tissue softening, out owed light brown water stain, sunken or cracked on trunks at the early infected stage. The fungal pathogen mainly infected the subcutaneous phloem through the wounded bark tissue at the initial infected stage. After successful infection, fungus hypha colonized the phloem tissues, leading to severe tissue cell death [3]. Research by plant protection experts has proved that the fungus valsa mali can survive in weak and dead tissues of the apple trees for more than one year before producing visible symptoms [4]. Namely, the pathogen is characterized by early incubation. When visible symptoms appeared, it is di cult to containment the AVC spread through conventional measures such as spraying fungicides, manually removing the diseased areas, and pruning the dead branches. So far, there are no effective methods for AVC treatment due to the complicated pathogenic mechanism. Thus early detection of the infected trees is necessary to prevent the rapid development of the disease.
In addition to direct isolation of pathogens, the molecular biology methods for detecting AVC included Enzyme-Linked ImmunoSorbent Assay (ELISA) and Polymerase Chain Reaction (PCR) [5,6]. ELISA kits had been widely utilized for their low cost but were ineffective at detecting symptomless tissue [7]. PCR technology is an effective detection method. Zang et al. [2] developed a nested PCR assay to detect the presence of valsa mali in apple trees and achieved an accuracy of 64.7%. However, DNA derived from the woody plant tissues contains PCR inhibiting compounds and can interfere with PCR reaction resulting in false negatives [8]. In addition, a well-equipped laboratory and experienced personnel are required, which is not feasible for on-site detection [9]. Therefore, due to these limitations, it is necessary to search for a fast, non-destructive and economical method that would enable highly accurate detection of AVC.
Reported studies have demonstrated that advanced non-invasive measuring technologies, such as RGB image processing [10], dielectric spectrum [11], laser scanning [12], and spectroscopic methods have the potential for diagnosing tree diseases in recent years. However, the advantages of spectroscopy over other novel techniques can be attributed to simplicity, rapidity, and affordability, which makes it indispensable in tree disease detection. Raman spectroscopy (RS) is a non-invasive, rapid, and high throughput spectroscopic technique [13]. Raman shift is only related to the vibration frequency of the molecular functional group and is not to the incident light. Therefore, the Raman '' ngerprint'' of each sample is unique [14]. Signi cantly, RS is a well-established chemical analysis method that provided essential information related to the biochemical composition of the tree tissue cell, such as protein, polysaccharide, and lipid. Both symptomatic and asymptomatic, these biochemical compositions were signi cantly different between diseased and healthy tissue. These compositions changes were re ected in Raman shifts or intensity change of speci c Raman bands assigned to those molecules. Therefore, RS provided an accessible way to identify subtle changes in the molecular compounds, which offered theoretical evidence for detecting tree diseases. Perez et al. [15] investigated the application of RS combined with statistical analysis for detecting citrus Huanglongbing (HLB) disease in the eld, and a good result was obtained with an overall classi cation accuracy of about 89.2%. Sanches et al. [16] readily distinguished between healthy and early-HLB citrus trees using a handheld Raman system and achieved an accuracy of 94%. In the following study, Sanches et al. [17] also demonstrated that utilizing a handheld Raman spectrometer in combination with chemometric analyses enables detection and identi cation of the secondary disease on HLB-infected orange trees. Those researches indicated RS technique combined with chemometric methods have great potential for detecting diseased trees.
However, RS is frequently interfered with by uorescence caused by chromophores in plant tissue, and compositional changes under disease stress may lead to Raman band broadening or drift [18,19]. This drawback may lead to signi cant deviations in the biochemical composition analysis of RS data. Surface enhanced Raman scattering (SERS) spectroscopy, based on the improvement of traditional RS, uses certain metallic nano-substrates such as gold or silver nanoparticles (AgNPs) to enhance signals under low laser powers, which maximizes uorescence suppression. Meanwhile, the Raman system combined with the micro-imaging technology allows for scanning micron-scale Raman collection points (e.g., onemicron pixel) [20], which offers chemical information on the constituents at high spatial resolution in situ. Qin et al. [21] developed a Raman chemical imaging system to visualize the internal distribution of lycopene in postharvest tomato, and established a Raman chemical image to visualize the spatial distribution of lycopene at different stages of maturity. Yang et al. [22] used a Raman imaging system to detect the spatial distribution of chemical components in maize seeds.
Therefore, this study aimed to develop a fast, non-invasive, and in situ diagnosis method for detecting AVC at early infection stages using SERS combined with micro-imaging technology. The main objectives are as following: 1. Optimizing experimental conditions for obtaining valid SERS micro-imaging data, including Synthesis and SERS AgNPs characterization; 2. Establishing optimal discriminative models for detecting AVC in early infection stages based on machine learning algorithms; 3. Generating microdistribution maps of cellulose and lignin at the disease-health junction of the tree phloem tissues.

Fungal culture and sample inoculation
The fungus valsa mali stored at -80℃ in an ultra-low temperature refrigerator were inoculated onto potato dextrose agar (PDA) medium. The two-year-old apple branches (Malus domestica cv. Fuji) were collected from the Economic Tree Garden of Northwest A&F University. The selected branches were pruned into 15 cm segments, and the surface of the branches was disinfected with 75% alcohol for 15 min, and then cleaned with sterile water three times until there was no odor. The ends of the branches were sealed with wet skimmed cotton to keep them fresh, followed by punching holes in the branches with a hole puncher (hole diameter 5 mm). The activated valsa mali fungus were inoculated on the wounds of apple branches with 2 points on each branch. After inoculation, the branches were transferred to a 25°C incubator for further incubation.

Synthesis and SERS AgNPs characterization
In the present research, AgNPs were synthesized by using the Lee-Meisel method. The synthesis steps were as follows: AgNO 3 (36 mg) was dissolved in 200 ml of ultrapure water and boiled quickly. A solution of 1 wt.% trisodium citrate (6 mL) was charged to the reaction solution and was held on boiling for 25 min accompanied by stirring at 200 rpm. After cooling to room temperature, pour the AgNPs solution into a centrifuge tube and store it away from light. The chemical reaction equation is as follows: 4Ag + + C 6 H 5 O 7 Na 3 + 2H 2 O = 4Ag + C 6 H 5 O 7 Na 3 + 3Na + + O 2 Subsequently, the prepared AgNPs were characterized to verify their validity. The morphology of the AgNPs was measured by Tecnai G2 transmission electron microscopy (FEI Inc., Hillsboro, OR, USA). The UV-Vis absorption spectra of the AgNPs were measured using Lambda 35 Spectrophotometer (PerkinElmer Inc., Waltham, MA, USA). The Raman spectra of the AgNPs were collected by DXR3xi Raman micro-imaging spectrometer (ThermoFisher Inc., Waltham, MA, USA).

SERS spectroscopy acquisition
First, each sample placed on a glass slide was dripped with the AgNPs. Then, each sample was placed on the automatic stage and was aligned with a Raman laser using a 10x/0.25NA magni cation objective lens for SERS imaging collection using a DXR3xi Raman micro-imaging system (ThermoFisher Inc., Waltham, MA, USA). Speci c parameters were as follows: the excitation wavelength was 785 nm; the collected spectral range was 300-3000 shift/cm − 1 ; the laser intensity was 2.6 mW; the exposure time was 0.00357 s (280 Hz); the scanning times was 30.
For spectral imaging in the x and y directions, the samples were scanned point by point in 2 µm steps. It should be noted that no destructive effects of the laser on the samples were observed. Routinely, before starting the Raman measurements, automatic calibration of the instrument was performed. The data acquisition software OMNICxi v1.6 was used to adjust the acquisition parameters.

Spectra preprocessing
Background noises and baselines were generated during the acquisition of the SERS spectra, which seriously impaired the interpretability of the spectra. Meanwhile, these noises and baselines would also reduce the simplicity and robustness of a calibration model built on these spectra. Therefore, selecting the optimal pretreatment method was necessary for improving the spectral quality. In the present research, spectral curves were rst extracted for each pixel point of the imaging data before spectra preprocessing. Then, the spectral data were preprocessed with three algorithms to eliminate noise and correct the baseline background. These three algorithms included the multiple spectral baseline correction (MSBC), the asymmetric least squares (AsLS), and the adaptive iterative reweighted penalized least squares (air-PLS). Subsequently, the advantages and disadvantages of the three algorithms were compared using the correlation analysis method.
The AsLS method, proposed by Eilers et al. [23,24] in 2003, is a classical baseline correction algorithm that combined a smoother with the asymmetric weighting of deviations from the smoothed trend to form an effective baseline estimation method. The MSBC method, proposed by Peng et al. [25] in 2010, is an improved approach based on the ASLS algorithm. The MSBC method learns baselines that perform well on the corresponding spectra and then "co-regularize" the selection by correcting inconsistencies between the spectra. Air-PLS is an improvement approach based on the weighting of the original model by the weighted least squares method. By meaning the iterative regression, the light environment is automatically subtracted, and the background is deducted [26].

Optimal variables selection and dimension reduction
Multivariate calibration methods in chemometrics aim to construct relationships between variables and properties of interest to make a classi cation model. However, with the redundant spectral variables, data usually included some noise and unnecessary information, which rendered the unreliable predictive properties. The methods have been used to address these problems, which were optimal variables selection and dimension reduction.
In the present research, principal component analysis (PCA) can replace the original variables with a few principal components with signi cant deviation to reduce the original high-dimensional variable space [27]. In addition, competitive adaptive reweighted sampling (CARS) and random frog (RFrog) algorithms were used in combination to select the optimal variables associated with the predicted properties while excluding the interference of unrelated variables. The CARS algorithm used exponentially decreasing function (EDF) as a selection strategy to select critical variables based on adaptive reweighted sampling competitively [28,29]. The RFrog algorithm calculated the selection probability of each variable by moving across trans-dimensions between models, enabling the search for the optimal variable [30].

Classi cation models
BP arti cial neural network (BP-ANN) [31] is the most classical and successful neural network commonly utilized for nonlinear tting and pattern recognition. BP-ANN is a one-way multi-layer feedforward network composed of an input layer, hidden layer, and output layer. The learning process is composed of forwarding propagation of signals and back-propagation of errors.
The random forest (RForest) is a widely used machine learning algorithm, which has been successfully applied to pattern recognition [32]. The essence of RF is a classi er containing multiple decision trees. When the test data entered the classi er, each decision tree classi ed the data. Finally, the class with the most classi cation results from all decision trees was taken as the result.
The least squares support vector machine (LS-SVM) is a machine learning method that emerged from the statistical learning theory. LS-SVM divided the data samples into multi classes by determining a hyperplane in the input space, where the hyperplane maximizes the separation between the classes [33].
Its key parameter indexes are the kernel function and the corresponding parameters of this function.
Extreme learning machine (ELM) is one of the effective training algorithms for single-layer feedforward neural networks [34]. ELM has a faster training and better generalization performance than traditional machine learning algorithms and could overcome issues such as the local minimum, inappropriate learning rate, and over tting [35]. Therefore, it is widely used in problems such as classi cation and regression.
All procedures were written in Matlab R2018b (The MathWorks, Natick, MA, USA) and ran on a personal computer with an Intel Core i5-9400F CPU, 16GB RAM, and a Windows 10 operating system. Results And Discussion 3.1 Phenotypic development of healthy and inoculated branch Figure 1(a) demonstrated the strains of the fungus valsa mali on PDA medium. The junctions of diseased and healthy tissues in the inoculated branch samples were assessed visually in the early stage of AVC disease. The bark surface of inoculated branch samples showed no visible symptoms during the rst 7 days. However, the phloem inside the bark appeared with early infection symptoms. Figure 1(b) demonstrated the dynamic process of the diseased phloem in the rst 7 days. The healthy phloem ( rst 3 days) had a smooth surface and displayed tender green. The diseased phloem became rough and showed pale brown when the symptoms of mild infection were visible on the 5th day. Subsequently, the diseased phloem appeared dark brown, and the tissue was rotten on the 7th day. The infected area of the diseased phloem, centered on the inoculation site, was continuously extended outward with time. Most notably, the infection symptoms always remained in the phloem and did not appear on the bark surface in the rst 7 days. The phloem regions were manually labelled as healthy, disease-1 (the disease-health junctions), and disease-2 (late-disease) according to the infection progression of the pathogen. In Fig.  1(c), the disease-health junction of the diseased phloem was presented using optical microscopy. It can be observed that the healthy tissue appeared green with intact cellular tissue structure; The disease-1 tissue appeared dark brown, and the infected tissue out owed light brown water stain; The disease-2 tissue is mainly characterized by canker, softened tissue.

SERS AgNPs and its characterization
The microstructure, UV-Vis spectrum, and Raman spectrum of AgNPs were analyzed to investigate the enhancement effects of the synthesized AgNPs. Figure 2(a) is the transmission electron microscopy (TEM) image of AgNPs, Fig. 2(b) displayed the UV-Vis spectra, and Fig. 2(c) showed the Raman spectrum.
In Fig. 2(a), it can be seen that the morphological character of AgNPs was very uniform in a monodisperse spherical shape. In addition, the average diameter of AgNPs was about 50 nm. As shown in Fig. 2(b), only one UV-Vis characteristic absorption peak (at 410nm) corresponding to the single plasmon resonance mode was observed, and the half-peak breadth was only 90 nm. These features further indicated that the shape and size of the synthesized AgNPs were very uniform. In Fig. 2(c), the Raman spectrum had a faint signal, suggesting that the synthesized AgNPs themselves had no strong Raman characteristic peaks and did not have an interferential effect on experimental results. Therefore, the synthesized AgNPs were suitable as SERS substrate to detect branch samples in this research.

Overview of SERS spectra
Spectral imaging is able to acquire the spectra from a speci ed point at the sample surface. By adjusting the x,y position, acquisitions of the spectra from multiple points on the sample surface can be performed, assembling a spectral image of the sample. Figure 3 showed the micro-spectral image of diseased phloem through pointwise scanning using by Raman micro-imaging system. The spectral data was obtained by splitting each pixel point of the spectral image. All the SERS original spectra were also shown in Fig. 3. There was an obvious baseline offset in the disease-1 and disease-2 even after dropwise addition of the AgNPs to suppress uorescence. Therefore, the MSBC, AsLs, and air-PLS algorithms were adopted to eliminate the disturbances of the baseline offset. The parameters for these methods were manually set to obtain the best result. For the MSBC algorithm, the parameters were set to λ = 150, µ = 8*10 7 , ρ = 0. For the AsLs algorithm, the parameters were set to λ = 5*10 3 , ρ = 1*10 − 4 . For the air-PLS algorithm, the parameters were set to λ = 150, ρ = 0.01. The corrected spectra and the predicted uorescence baselines were plotted in Fig. 4(a)-(c). As shown in Fig. 4, the curved baselines were welltted and subtracted by the three algorithms. The corrected spectra showed that the baselines were pulled back to zero absorbance, the peak locations remained unchanged, and the peak shapes were more obvious, which indicated the effectiveness of the baseline correction methods.
As shown in Fig. 4, many SERS peaks can be clearly observed. In detail, the peaks at 319, 957, 1026, 1165, 1242, and 1325 cm − 1 were indicators of cellulose, corresponding to C-C-C or C-O-C skeletal bending [36], C-C or C-O stretching vibration [37], C-C or C-O stretching vibration [37], H-C-C or H-C-O skeletal bending [38], C = O stretching vibration [37], and C-H bending vibration [38]. The peaks at 625, 731, 1599, and 2939 cm − 1 were indicators of lignin, corresponding to skeletal bending [39], skeletal bending [39], C-C aromatic ring [40], and C-H asymmetric stretching vibration [41]. The assignment of characteristic wavenumbers was presented in Table 1. 1599 C-C aromatic ring [40] 2939 C-H asymmetric stretching vibration [41] 3.4 Selecting optimal preprocessing method Page 9/22 The correlation analysis method was adopted to select the best preprocessing algorithms. The correlation between the corrected variables was plotted in Fig. 5. Especially, the regions close to the line y = x had a correlation coe cient of 1, indicating that the original spectra were greatly disturbed by the baseline offset. This high degree of collinearity would cause adverse effects on classi cation analysis. Comparing Fig. 5(b)-(d) with Fig. 5(a), the regions with a high degree of collinearity have noticeably declined, and most of the spectral variables had low correlation with others except in the spectral ranges of 300-400, 640-880, and 1490-1970 cm − 1 . In addition, the proportion of pixel points with values greater than 0.6 to the total pixel points was calculated, and the proportions were 0.35, 0.09, 0.24, and 0.07, respectively. The AsLs method failed to effectively t the baseline at 1200-1600 cm − 1 , resulting in a relatively poor result of baseline correction. This result indicated that the MSBC and air-PLS baseline offset elimination strategies can greatly reduce the high correlation levels among spectral variables, and the air-PLS algorithm had the best elimination effect. Therefore, the spectra corrected by the air-PLS algorithm were used for further analysis.

Clustering visualization by PCA
As an unsupervised learning strategy, PCA was often used to demonstrate clustering effect based on the similarity of samples in the feature space. In the present research, PCA was performed on the preprocessed spectra of the total sample set to visualize the distribution of healthy, disease-1, and disease-2 samples. The score scatter plot of clustering analysis was shown in Fig. 6. PC1, PC2, and PC3 provided 50.85%, 14.73%, and 10.73% of the variations among samples, respectively. The cumulative contribution of the rst three PCs achieved 76.31%. Figure 6 demonstrated that the healthy, disease-1, and disease-2 samples had obvious clustering effects. Therefore, the three types of samples had distinct spectral characteristics.

Optimal variables selection
There were 1401 variables in the SERS spectra. However, spectral data contained many non-critical variables, which might reduce the accuracy and stability of subsequent discriminant models. Therefore, the selection of optimal variables was important for better models. In the present research, two strategies were used to select characteristic variables: algorithm selection (CARS combined with RFrog) and manual selection.
Then, the 10 most important variables were extracted from the total 1401 spectral variables in the full range of 300-3000 cm − 1 , as shown in Fig. 7. The selected optimal variable subsets were set to subset-1 and subset-2, respectively. In detail, the characteristic variables selected by the two methods were listed in

Discriminant models establishment
Before establishing discriminant models, SERS spectral data were divided into a calibration set and a prediction set at the ratio of 3:1. Generally, the independent variable (x) represented the spectral matrix of samples, and labeled grades (y) stood for the AVC infection severities. Therefore, the labels for healthy, disease-1, and disease-2 were 1, 2, and 3, respectively. BP-ANN, ELM, RForest, and LS-SVM models were established using four variable matrices (x) to classify the healthy, disease-1, and disease-2 samples. These four variable matrices (x) included the full SERS spectra, the subset-1, the subset-2, and the predicted uorescence baselines.
After formula calculation and experience screening, the learning rate of the BP-ANN model was set uniformly to 0.1, and the number of neurons in the hidden layers were 10, 3, 3, 10, respectively. The number of neurons in the hidden layer of the ELM model was determined by comparing the performances of the ELM model using different numbers of neurons from 1 to 100 with a step of 1. The ELM with 34 neurons was selected as the optimal model. The number of decision trees in the RForest model was determined by comparing the model performances using different numbers of decision trees from 1 to 500 with a step of 1. The RForest with 100 decision trees was selected as the optimal model. The LS-SVM model used RBF as the kernel function, and the optimal penalty coe cient (c) and the kernel function parameter gamma (g) were obtained by a grid search procedure. Finally, the best-c was 379, and the best-g was 45.
The discriminant accuracy of the models was presented in Table 2. There were signi cant differences in the classi cation results of the four models on the full spectra dataset. The classical BP-ANN model learned complex relationships between data, thus improving the analytical performance (such as high sensitivity and speci city) of classi cation. However, the BP-ANN model has the regrettable tendency to train toward a local optimal rather than a global optimal [32]. This also explained why the BP-ANN model has the lowest classi cation accuracy on the full spectra dataset compared to the other three models. As opposed to the BP-ANN model, the LS-SVM model was deterministic and their solution was global and unique. As a result, the classi cation accuracy of the LS-SVM model improved signi cantly compared to the BP-ANN model. In the present case of the RForest model, each tree selected features that maximize separation of the dataset in three classes. The output of each decision tree was then pooled, leading to the nal optimal classi cation result. Therefore, the RForest model also exhibited excellent analytical performance comparable to the LS-SVM model. Compared with the full spectra dataset, over 99% of non-critical input variables (10 vs. 1401) were removed in subset-1 and subset-2. Meanwhile, the classi cation accuracy of the subset models was not decreased signi cantly, which demonstrated the superiority of the optimal variables selection strategies. Generally, the uorescence baselines reduced the simplicity and robustness of a calibration model built on the raw spectra. Therefore, the existing studies by other scholars had removed the uorescence baseline from the raw data. However, the classi cation accuracy of the models based on the uorescence dataset was surprisingly excellent in the present research. Fungus valsa mali produced various chemical substances such as protocatechuic acid, isocoumarin, and phlorizin when infesting the phloem tissue. Although these chemical substances produced uorescence interference, the baseline also simultaneously re ected the chemical composition and content information. Thus, the uorescence baseline became available as valid information. This innovative discovery will guide our subsequent research.
However, the above three methods mainly focused on feature extraction, optimal parameters, and optimal variables selection without considering the model runtime, which was also important for intelligent online detection. Furthermore, the intelligent online detection must be an important research direction in plant disease detection elds. The ELM model randomly generated the hidden node parameters and then analytically determined the output weights instead of iterative tuning [42]. Thus, the ELM model runs very fast and lends itself to real application scenarios, which is very important for intelligent online detection.
As seen in Table 2, the ELM model ran as fast as 0.01 seconds, far better than the other three methods.
The LS-SVM model rst used the grid search method to select the best-c and best-g, severely delaying the discriminatory e ciency and making the run time as high as 0.91 seconds. Therefore, the ELM algorithm can be considered as the detection model in the subsequent online detection study.

Chemical imaging analysis of the disease-health junction
The SERS microspectral image data cube of each phloem sample was processed by the air-PLS algorithm to eliminate uorescence baseline, and the parameter values were consistent with Sect. 3.3.
Then the processed microspectral cube in a pixel-wise manner generated chemical distribution images in Fig. 8. The symmetric tensile vibration at 1600 cm − 1 in lignin was identi ed as the characteristic peak of lignin components, while the bands at 300-550 cm − 1 were contributed by cellulose. Therefore, these images were constructed based on the cellulose signature peak at 300-550 cm − 1 and lignin signature peak at 1600 cm − 1 .
Due to the cell walls were probed in phloem tissues, the spectra collected do not contain any intracellular signals. Figure 8 Conclusion SERS spectroscopy combining with chemometric methods for early detection of the AVC disease was considered feasible and promising. Firstly, three spectral preprocessing algorithms were compared, and the air-PLS algorithm was considered effective in removing the spectra uorescence background. PCA provided a good clustering effect to visualize the distribution of samples in three classes. Two strategies selected optimal variables to develop machine learning models for detecting AVC disease, and these models exhibited excellent analytical performance. Meanwhile, the classi cation accuracy of the models based on the uorescence dataset was surprisingly excellent, which was a great inspiration. Besides, this study proposed a new strategy for SERS chemical imaging of the diseased apple phloem tissues using a non-destructive, label-free method. This chemical imaging provided the spatiotemporal dynamic characteristics of changes in the cellulose and lignin of the phloem disease-health junction under fungus stress, which would be helpful in early AVC detection and analysis of disease dynamics.

Declarations
Authors' contributions    The sketch represents the basic principle of the spectral data cube and shows the raw spectra and spectral imaging of three types of samples.  The correlation between the corrected variables was plotted. (a) high correlations were found among original spectral variables, (b) correlations were noticeably declined using MSBC, (c) correlations were noticeably declined using AsLs, (d) correlations were noticeably declined using air-PLS. However, the air-PLS algorithm has the best elimination effect.

Figure 6
Score plots of rst three PCs from PCA on spectral data of the three types of samples. The characteristic variables for early disease detection of the AVC disease. Two strategies were used to select characteristic variables: algorithm selection (subset-1) and manual selection (subset-2). Figure 8