In Situ Monitoring of Nitrate Content in Leafy Vegetables Using Mid-Infrared Attenuated Total Reflectance Spectroscopy coupled with Intelligent Algorithm


 Background: Vegetables are one of the most important nitrate sources of human diary diet. Establishing fast and accurate in situ nitrate monitoring approaches that could be used in the plant growth process and vegetable markets is essential.Results: Incorporating the unique feature of N-O asymmetric stretch absorption in the mid-infrared region (1500-1200 cm-1), portable Fourier-transform infrared attenuated total reflectance (FTIR-ATR) spectroscopic instruments, along with the Euclidean distance-modified intelligent algorithm extreme learning machine (ED-ELM) model, were employed to evaluate the nitrate contents in leafy vegetables. A total of 1224 samples of four popular vegetables (Chinese cabbage, swamp cabbage, celery, and lettuce) were analyzed. The results indicated that the nitrate contents (mean values: Chinese cabbage: 7550 mg/kg; swamp cabbage: 4219 mg/kg; celery: 4164 mg/kg; lettuce: 4322 mg/kg) highly exceeded the World Health Organization (WHO))-specified maximum tolerance limits. The ED-ELM model showed a better performance with the root-mean-square-error of 799.7 mg/kg, the determination coefficients of 0.93, the ratio of performance to deviation of 2.22, the optimized calibration dataset number of 100, and the number of hidden neurons of 30.Conclusion: The results confirmed that FTIR-ATR, along with the suitable model algorithms, could be used as a potential rapid and accurate method to monitor the nitrate contents in the fields of agriculture and food safety.

temperature, and humidity), harvest time, and storage time [13][14][15]. For instance, a 68 significant decrease in nitrate level is observed at ambient temperatures, but nitrate 69 level remains constant over time during storage under refrigerated conditions [1,16]. 70 On the other hand, risk assessment of the safety of dietary nitrate intake and 71 exposure from vegetables has been a major health concern in many countries in recent from four markets on a single day, and they were analyzed by both spectral and 127 laboratory methods on the same day to ensure that the vegetables were fresh and 128 nitrate contents were relatively stable.  water (10 g), ammonia buffer (5 mL) (pH = 9.6-9.7), and activated carbon powder 143 were added to a conical flask, and the mixture was stirred (200 r/min) at 25C for 30 144 min. The mixture was then transferred to a volumetric flask (250 mL) and mixed with 145 150 g/L potassium ferrocyanide solution (2 mL) and 300 g/L zinc sulfate solution (2 146 mL); deionized water was added to bring the volume of the resulting solution to 250 147 mL. This mixture was kept standing for 5 min and then filtered. Then, the filtered   The pre-processed spectra were divided into calibration and validation datasets. Then, 186 the extreme learning machine (ELM) model, an intelligent algorithm, was employed 187 to predict the nitrate contents. To improve the prediction accuracy, the calibration 188 dataset was modified before being calibrated by the Euclidean distance (ED) method.

189
Meanwhile, the partial least squares (PLS) model was used for comparison.

190
Subsequently, the performance of the models and prediction results were evaluated.  based on our previous result, which reported that it was suitable for spectral 205 identification [33]. ED between the calibration and target samples was computed 206 using pairs of curves and their derivatives as a measure of similarity for clustering.
where EDik is the Euclidean distance between the ith target sample xi and each kth 209 calibration sample xk, k ≠ i; and j is the variable index, j = 1, 2,…p. The calibration 210 data set sequence was re-ordered in an ascending manner based on the ED results, 211 which meant that similar spectra were near-neighbors.

213
Extreme learning machine model where j = 1, 2, ...n; wi = (wi1, wi2,...win) T is the weight vector connecting the ith hidden 220 node to the input nodes, βi = (βi1, βi2,…βiK) T is the weight vector connecting the ith 221 hidden node to the output nodes; and bi is the threshold of the ith hidden node. Then,

222
HT   (4) The difference between conventional gradient-based solution methods and the 226 ELM method was that the ELM method determined the function by using the 227 formula: where H + is the Moore-Penrose generalized inverse of matrix H. 230 In addition, the ELM input contained the training dataset and number of hidden where y′i and yi are the predicted data and data measured by the chemical analysis 256 method, respectively; n is the number of data sets; and SD is the standard deviation.

257
RMSEC and RMSEP represented the root-mean-square-error in the calibration and 258 validation dataset models, respectively. The RPD, which is used for normally 259 distributed data, represented prediction accuracy, and should be higher than 1

304
Moreover, the second-order derivative spectra were then calculated and plotted, 305 as shown in Fig. 2(b). The peak at approximately 1460 cm -1 was associated with N=O 306 vibration, the peaks at 1375 cm -1 and 1363 cm -1 were attributed to N=O and NO, 307 respectively, and the peak at 1300 cm -1 was associated with NO vibration [30].     Euclidean distance method based on the spectra feature of target sample, which 420 belongs to self-adaptive models, to obtain a similar-sample dataset [33,40]. Therefore,