Plant and Inoculum Preparation
Tomato (Solanum lycopersicum) seeds (c.v. Early Girl, Mountain Valley See Co.) were seeded in a 48-cell tray and grown under greenhouse conditions between 25-38°C for three weeks. The tomato seedlings germinated seven to ten days after seeding and they were fertilized (Osmocote Smart-Release, ICL) one time two weeks after seeding. Plants were irrigated for two minutes twice a day at eight o’clock in the morning and five o’clock in the afternoon. Three-week-old tomato plants were used for the inoculation experiments. The plants were sprayed with sterilized water and bagged for 24 hours before inoculation and another 24 hours after inoculation. All institutional guidelines at Virginia Tech for handling tomato plants were followed. The tomato plants used in this research were obtained from a licensed seed company and do not contain transgenic materials. For non-transgenic plants, no additional permission is required for handling these tomato plants according to Virginia Tech biosafety guidelines.
Inoculum Preparation
A Xanthomonas perforans strain isolated previously from a tomato field in Virginia was grown in a shaker incubator overnight in R2 broth (Teknova) at 28°C and 200 rpm. Two-hundred-micro-liter aliquots of X. perforans. liquid cultures were spread on R2A medium amended with 20 mg/L of the antibiotic rifampicin (Sigma-Aldrich). The survivors grown on 20 mg/L rifampicin-R2A medium were transferred to 50 mg/L rifampicin-R2 agar medium, and this process was repeated on 100, 150, and 200 mg/L rifampicin-R2 agar medium. The survivors grown on 200 mg/L rifampicin-R2A medium were cultured on 200 mg/L rifampicin-R2A agar medium again to confirm their resistance to rifampicin. Six final survivors were selected for pathogenicity tests on three-week-old healthy tomato plants, and the one that showed the highest virulence was used for the inoculation experiments. The inoculum consisted in bacteria resuspended in 10 mM MgSO4 at a concentration of 1-5 x 10^7 CFU/ml.
Inoculation and Experimental Design
Tomato plants were inoculated by dipping the plants (including stems and leaves) into a sterilized plastic cup filled with 300 ml X. perforans inoculum mixed with 0.02% Silwet, for a minute. The control plants were dipped in a sterilized plastic cup filled with 300 ml 10 mM MgSO4 mixed with 0.02% Silwet for a minute (Hert et al. 2009). The inoculation experiments were performed twice in this study. In the first experiment, there were eight experimental units (4-cell inserts) and each contained two tomato plants. Units 1, 2, 5 and 6 contained uninoculated control plants, while units 3, 4, 7, and 8 contained plants inoculated with X. perforans. The second experiment employed a similar design, except that there were 10 experimental units, with units 1-5 uninoculated and units 6-10 inoculated.
Data Collection
Data were collected nine times in each experiment: before inoculation (bi), 2 hours after inoculation (0 days after inoculation, dai), every 24 hours from 1 to 7 dai. Three types of data were collected, including disease rating, bacterial population size measurements, and hyperspectral image data (Figure 1). Disease-related data were collected based on the visual observation of tomato plants in each experimental unit. Then one to six leaves were removed from each experimental unit. After rating the disease severity of detached leaves, we used a hyperspectral benchtop imaging system (Resonon PIKA-L and PIKA-IR) to capture the hyperspectral images of each detached leaf. Then the leaves were processed to estimate the bacterial population size in the leaf tissue.
Disease Rating
Two types of disease-related data were collected in this study to characterize the disease progress of BLS. First, we collected disease incidence data by estimating the percentage of leaves showing bacterial leaf spot in each experimental unit. Then we randomly selected five tomato leaves and rated the disease severity on each leaf (Chester 1950). We used a 0-5 scale to rate the severity data, where 0 indicates healthy, 1 indicates spot coverage between 0 and 10%, 2 indicates spot coverage between 10% and 30 %, 3 indicates spot coverage between 30% and 50%, without apparent dead tissues, 4 indicates spot coverage of more than 50% and some of the diseased tissue is dark and curled or decayed, and 5 indicates completely dry and curled or decayed tissue with numerous bacterial spots. Then a disease severity index was calculated as:
where NRi indicates the number of seedlings showing the corresponding disease level i; i ranges from 0 to 5 (Chester 1950). After rating the disease incidence and severity in each experimental unit, one to six leaves were removed from each experimental unit, and disease severity was rated for each detached leaf with the same 0-5 scale.
Hyperspectral Image Acquisition
Hyperspectral images of the detached leaves were collected using a RESONON benchtop system with Pika L (visible & near infrared: 400-1000 nm) and Pika NIR320 (shortwave infrared: 900-1700 nm). All leaf samples were scanned with both Pika L and Pika NIR320 cameras to collect hyperspectral image data ranging from 400 to 1700 nm. Before scanning, a camera (Pika L or Pika NIR320) was mounted on a tower, and the leaf samples were laid on a stage below the camera and four high-intensity halogen spotlights (Figure S4). The system was set to reflectance mode, where the system measured the absolute reflectance of the leaf samples by applying dark correction, and response correction using a reference (calibration) tile provide by RESONON. The environmental lights were turned off, with only the imaging halogen lights on. A piece of black cotton fabric or a white calibration tile was placed on the stage, and the leaf samples were laid on the top of the background materials. During data acquisition, the stage moved in auto speed mode, and the camera scanned multiple lines simultaneously and translated into three-dimensional image cubes containing two-dimensional spatial and corresponding spectral data. Each scan yielded a bil (Band Interleaved by Line) data cube and a header file, which contained either 561 spatial pixel information (5.86 mm/pixel) at 300 VIS-NIR (385.63-1027.18 nm) wavelength bands or 164 spatial pixel information (30mm/pixel) at 168 NIR (890.68-1719.81 nm) wavelength bands.
Bacteria Growth Curve
After the hyperspectral data from the detached leaves were acquired, each leaf was processed to estimate the bacterial pathogen population size (CFU per cm2 of leaf area). To begin with, five (1 cm diameter) round leaf discs were randomly subsampled from a leaf sample. The subsamples were sterilized with 75% ethanol for 45s, followed by a triple rinse with sterilized distilled water (SDW), and then sterilized with 10% bleach (0.5 % sodium hypochlorite) solution for 30s, followed by triple rinse with SDW. Five surface-sterilized leaf discs were placed in a 1.5 ml Eppendorf tube and crushed by a disposable pellet pestle (Fisher brand) with 500 mL SDW. The leaf sap was collected as the fine leaf tissues were evenly suspended in the SDW, and it went through serial dilutions to obtain 10-1–10-8 of the original concentration. Two 20 mL aliquots of each diluted leaf sap were evenly spread on 200 mg/L rifampicin-R2A medium. The cultures were stored at 28 °C for two days and the bacterial colonies grown from each leaf sap concentration were counted and averaged. Finally, the original concentration of each leaf sap sample was calculated and the bacterial pathogen population size per cm2 of leaf area was estimated for each detached leaf sample.
Potentially Confounding Factors
In order to investigate the influence of potential confounding factors (including background materials and colors, scan precision, leaf size, and leaf structure) on leaf reflectance, additional leaves and corresponding hyperspectral images were collected.
The image data of 18 tomato leaves collected using dark fabric as background were used to test scan precision. The sample leaves included 8 leaves (4 uninoculated and 4 inoculated) from the first inoculation experiment and 10 leaves (5 uninoculated and 5 inoculated) from the second experiment. To test this hypothesis, each leaf was scanned twice at different position on the same stage and background. Leaf-level full spectra reflectance data of each leaf was compared between two scans, along with those of 18 randomly selected background pixels from each scan, using principal component analysis (PCA) and linear discriminant analysis (LDA) with scikit-learn in Python 64.
All leaf samples were collected with two background materials and colors: a piece of black cotton fabric, and a white calibration tile, placed on the stage. Therefore, each leaf yielded two sets of data: VISNIR (400-1000 nm) and SWIR (900-1700 nm) images with black cloth, and VISNIR and SWIR images with white tile. The image data of 16 tomato leaves were used to test this hypothesis, which included eight uninoculated and eight inoculated leaves collected at the end of each experiment. Leaf-level, spatially averaged full spectra reflectance data of each tomato leaf were compared across background materials, along with those of 16 randomly selected background pixels, using PCA and LDA.
The image data of 18 uninoculated tomato leaves collected with dark fabric background were used to test the leaf size hypothesis. Two leaves were collected from each of nine uninoculated plants, with a length of over 2.0 inches (big) or less than 1.5 inches (small). Therefore, the leaves collected from those nine uninoculated plants were labeled as “big” or “small” and plant group ID. Leaf-level full spectra data of each leaf were compared between and among nine groups, along with those of 18 randomly selected background pixels, using PCA and LDA.
The image data of nine uninoculated tomato leaves collected with dark fabric background were used to test this hypothesis. Two leaves were collected from each of nine uninoculated plants. Five pixels were randomly selected from five structure areas from each leaf, including apex, margin, midrib, veins, and interveinal leaf tissues. The full spectra data of each pixel were compared between leaf structures, along with those of 45 randomly selected background pixels, using PCA and LDA.
Hyperspectral Image Analysis
In order to differentiate between infected tomato leaves and uninoculated healthy leaves and observe the physiological changes on tomato leaves during disease progression, HIS data collected at nine time points were analyzed with machine learning methods (Figure 1). HSI data were analyzed at three different levels, including VI (Vegetation Index), pixel, and whole image levels. Eight algorithms including LDA, Supportive Vector Machines (SVMs), K-nearest neighbors (KNN), Random Forest (RF), Gradient Boosting Machines (GBM), Multilayer Perceptron (MLP), and Extreme Gradient Boosting (XGB or XG Boost) were used with scikit-learn 64 and xgboost 65 in Python to perform feature selection and classify tomato leaf samples based on HSI data. During image data analysis, leaf samples from two classes (uninoculated healthy and infected) from nine data collection time points were used for ML training and testing, with 10-fold cross-validation. Then accuracy and F1 scores were compared to evaluate the performance of each model.
HSI images were cropped with Spectronon software (Spectronon Pro, Resonon, Bozeman, MT) to retain the leaf and minimum background pixels. The leaves were labeled by treatment, uninoculated or inoculated, and by data collection time points. The time points were further categorized into four stages: beginning (bi & 2hr ai), early (pre-symptomatic: 1-3 dai), mid (4-5 dai), and late (6-7 dai) stages. Thirty-six images containing 228 leaves were used for leaf-level analysis. Each leaf generated a mean spectral data set that averaged the pixel reflectance intensity at every wavelength band. There were 36 VISNIR and 36 SWIR leaf-level data from ‘beginning’, 54 VISNIR and 54 SWIR leaf-level data from ‘early’, 46 VISNIR and 46 SWIR leaf-level data from ‘mid’, and 92 VISNIR and 92 SWIR leaf-level data from ‘late’. There were four classes included in the ‘beginning’ dataset: uninoculated group bi, inoculated group bi, uninoculated 2hr ai, and inoculated group 2hr ai. There were six classes in the ‘early’ dataset, including uninoculated 1 dai, inoculated 1 dai, uninoculated 2 dai, inoculated 2 dai, uninoculated 3 dai, and inoculated 3 dai. There were four classes in the ‘mid’ dataset, including uninoculated 4 dai, inoculated 4 dai, uninoculated 5 dai, and inoculated 5 dai. There were four classes in the ‘late’ dataset, including uninoculated 6 dai, inoculated 6 dai, uninoculated 7 dai, and inoculated 7 dai.
As for pixel-level analysis, nine classes of 72 pixels (648 VISNIR pixels and 648 SWIR) were randomly selected from 27 cropped image files containing 72 leaves collected at the end (7dai) of each experiment, which included class 0: green areas on the edge of uninoculated healthy leaves (GH-e), class 1: green areas in the interveinal areas on uninoculated healthy leaves (GH-iv), class 2: bacterial spots on the edge of infected leaves (BS-e), class 3: bacterial spots in the interveinal areas on infected leaves (BS-iv), class 4: abiotic spots on the edge of uninfected leaves (AS-e), class 5: abiotic spots in the interveinal areas on uninfected leaves (AS-iv), class 6: green areas on the edge of symptomatic tomato leaves (GS-e), class 7: green areas in the interveinal areas on symptomatic tomato leaves (GS-iv),and class 8: background (bg). The presence of abiotic spots and bacterial spots was verified by visual observation and culture isolation. Pixels were selected manually to ensure the same amount of leaf margin pixels and interveinal pixels were selected from each leaf, with the number ranging from four to six. The spectra data were saved in txt files using Spectronon Pro, and the raw data files were converted to csv files before data analysis.
For leaf-level full spectra analysis, the first and last wavelength bands were removed to exclude potential noises. Therefore, VISNIR spectra data files contained 298 bands (387.65 nm to 1024.90 nm, with an approximate 2nm interval) and SWIR spectra data files contained 166 bands (895.52 nm to 1714.71 nm, with an approximate 5nm interval). Data were normalized to fit a 0-1 scale. After data preprocessing, each pixel in each class contained 298 bands in VISNIR files and 166 bands in its SWIR files. LDA, SVM, KNN, RF, GBM, XGBoost, and MLP were employed to train the classification models. Hyperparameters (Figure S5) for LDA, SVM, KNN, RF, and MLP were tuned with Grid Search, and those for GBM and XGBoost were tuned with Randomized Search 64. The best combinations of hyperparameters were retained to train the classification model, with stratified 3-fold cross-validation repeated three times. The choice of 3-fold cross validation was made because of the small sample size. The accuracy and weighted F1 scores of each model were compared, and the best model was applied to predict leaf health on whole hyperspectral images. Important wavelength bands were extracted based on gini importance (mean decrease impurity) from corresponding RF, GB, or XGB models, whichever showed the best performance.
In leaf-level VI analyses, 14 vegetation indices (VIs, Table S4) that might be related to tomato disease stress were extracted from the pre-processed HSI data 24,30,31,36,56. The above-mentioned classifiers were employed to train the classification models, and the performance was evaluated as described above. As for pixel-level full spectra analysis, the process was similar, except that 10-fold cross-validation was used instead of 3-fold, as the pixel data set size was bigger. The accuracy of classification models was compared using ANOVA with statsmodels 51 followed by post-hoc analysis with scikit-posthocs 52. The testing accuracy scores generated from classification models trained with leaf-level full spectra data were correlated with bacterial population in tomato leaf tissues averaged across two experiments during BLS disease progress. Pearson correlation coefficient and p-value were calculated with scipy.stats 63.
The best classification models were applied to healthy and diseased whole leaf and living plant images collected at 7dai. The RF model trained with 7dai VISNIR data and the SVM model trained with the SWIR model were applied to whole leaf images and living plants. The raw hyperspectral images were processed, and the reflectance data were extracted using spectral python (SPy) in Python 66. The predicted results were compared with visual observations and bacterial population sizes (the ground truth data).