2.1 Mosquito collection and rearing
2.1.1 Laboratory-reared mosquitoes
Aedes albopictus eggs were collected from Hammond Island, Torres Strait, Australia in June 2016 and used to derive a stable laboratory maintained colony at the quarantine facility in QIMR Berghofer. Larvae hatched from that colony were reared in trays (35 × 15 cm) of de-chlorinated water kept at 27°C and 70% humidity. Larvae were fed with ground fish food ad libitum (Tetramin fish food flakes; Blacksburg, VA) and pupae were removed to round containers (9 cm diameter, 130 mls water). Emerging females were transferred daily to cages and provided with 10% sucrose ad libitum.
Those adults were killed 2, 5, 8, 12 and 15 days post-emergence. Individuals of the same age from different generations were pooled to include possible variations in laboratory rearing conditions. Mosquitoes were anaesthetised with CO2 and placed in individual 1 mL tubes containing RNAlater® (Ambion, TX, USA), a standard protocol for NIRS characterization [22]. Tween–20 (0.1% v/v) was added to the RNAlater® to reduce surface tension and allow RNAlater® to fully penetrate the mosquito. Sample tubes were maintained at room temperature for 24h. The mosquitoes were then preserved at –20°C until spectral collection (< 14 days later). The total number of mosquitoes collected is listed in Table 1.
2.1.2 “Semi-field” mosquitoes
The use of the term “semi-field” simply reflects the fact that these mosquitoes have an origin that is more representative of the field than of the laboratory. They were collected as pupae from a natural habitat (a productive rainwater tank) on Hammond Island, Torres Strait during March 2018. This site is also the origin of the material used to derive the 2016 laboratory colony (see above). Pupae were allowed to emerge in standard rearing cages (60 × 60 × 60 cm, Bugdorm, Megaview, Taiwan) maintained outdoors under ambient conditions. Adults were aspirated from the cages when they were 1, 7 or 14 days old, immobilized by cold (4oC) and placed in RNAlater® with 0.1% (v/v) of Tween–20 at –20°C until ready for shipping to QIMR Berghofer.
2.2 Mosquito scanning using near-infrared spectroscopy
Preserved, frozen mosquitoes were defrosted at room temperature and excess RNAlater® removed by placing specimens on paper towelling. A Spectralon plate was used for spectral background collection. Individual mosquitoes were placed on the Spectralon plate laterally, and the head and thorax were scanned using the LabSpec 5000 NIR spectrometer (Malvern Panalytical, Longmont, CO, USA). NIR spectra were obtained with an attached bifurcated fiber-optic probe that is approximately 2.4 mm above the Spectralon plate; scanning an area of approximately 2 mm. Spectral data was recorded in the 350–2500 nm region. Each spectrum was built using an average of 30 scans at a sampling resolution of 3 nm. Spectral data were collected using RS3 v6.4.3 (Malvern Panalytical, Longmont, CO, USA). Reflectance (R) is converted to absorbance (log 1/R) through RS3 prior to analyses.
2.3 Data analysis
2.3.1 Estimating mosquito age in days
Analyses were performed within the wavelengths of 700 to 2350 nm to disregard background noise at the start and end of the spectra, and any colour differences in mosquitoes (detected at < 700 nm). PLS linear regression was used to convert spectral data into predictive models of mosquito age (in days). Previous mosquito NIRS studies have used GRAMS IQ software (Thermo Scientific, MA, USA) to conduct the PLS analysis GRAMS IQ uses a “leave-one-out” method for internal cross-validation where one sample is taken from the calibration set and the remaining samples are used to develop an equation that would predict that removed sample. This process is repeated for all samples to create a predictive regression model (calibration model). This process is repeated varying the number of PLS components (factors) and the best model has historically been selected by eye [1, 11, 23], choosing the number of components that maximises accuracy whilst trying to minimise over-fitting (inclusion of too many components results in models that fit the sampled data perfectly but that fail to predict new data). This selection process is rather a subjective process making it hard to do reproducible science. Here we repeat the methods of the past (leave-one-out internal cross validation and selecting the number of components by eye) and refer to this method as “Standard PLS”.
An alternative approach for the development of predictive models is to split the dataset into three for training, validation and testing [24, 25]. Here we use 50% of the sample for training (fitting the model to samples of known age using different numbers of PLS components), 25% for validation (selecting an optimum number of that components that effectively predict another subset of known samples) and 25% to the test dataset (evaluating that final model against a blinded sub set of data). This process is repeated 100 times, each time randomly resampling the original dataset to generate different training, validation and testing datasets. The mean model is then selected from the 100 randomisations in order to average out sampling error. Here the number of components selected during the validation exercise is the lowest that permits an average error of 0.5 days of the best fitting model. This value was arbitrarily selected to be a compromise between accuracy and generalizability but has the advantage over previous methods in that this value can be defined and therefore the research is reproducible. This resampling procedure and selection of the number of components is referred to as “resampling PLS” and has been used to optimise models for predicting the presence of malaria parasites in mosquitoes [25]. Results are shown comparing the standard error of the predictions with the true age of the mosquito (root-mean-square deviation, RMSD). To allow a direct comparison with Standard PLS, RMSD estimates for Resampling PLS were calculated on estimates of individual mosquito age calculated from the mean of the 100 randomisations using the entire dataset.
Mathematical pre-treatment of spectra may reduce noise and increase differentiation between sample properties. To investigate whether the accuracy of the standard PLS models could be improved by pre-processing techniques we examined standard normal variate (SNV), mean normalizing, and detrend-SNV methods to minimize spectral distortion due to scattering. We used second derivative Savitzky-Golay (SG) filtering to remove baseline noise [26, 27].