Feed forensic strategy: Incorporating multivariate data analysis with high-performance liquid chromatography refractive index detector and differential scanning calorimeter for authentication of fish feed sources


 This study authenticated fish feed sources and determined lard adulteration using dataset pre-processing, principal component analysis (PCA), discriminant analysis (DA) and partial least square regression (PLSR) on 19 triacylglycerols (TAGs) and 16 thermal properties (TPs). At cumulative variability (90.625%) and Keiser-Meyer Olkin (KMO) value (0.811), the PCA identified strong factor loading variables, i.e., OLL, PLL, OOL, POL, PPL, POO, PPO, PSO, ICT and FHT in PC1 and LLLn, OOO and CT2 in PC2. These variables were significantly (p < 0.05) contributing to lard-palm-oil (L-PO) clusters: (1) POO, PPO and PPL (high loading) and OLL, PLL, OOL, ICT, POL, PSO and FHT (low loading) in 0:100 and 25:75 L-PO clusters; (2) CT2, OOO and LLLn (high loading) in 50:50 L-PO cluster; and (3) OLL, PLL, OOL, ICT, POL, PSO and FHT (high loading) and POO, PPO and PPL (low loading) in 72:25 and 100:0 L-PO clusters. Training, validation and testing datasets had 100%, 84.44% and 100% correct-classification, respectively at p < 0.0001 of Wilks' lambda and p < 0.0001 Fisher distance. The DA selected PLL, OOL, POL, PPL, PSO, ICT and FHT as the significantly authenticating biomarkers (p < 0.05). With determination coefficient (R²) (0.9693), mean square error (MSE) (38.382) and root mean square error (RMSE) (6.195), the PLSR's variable importance in the projection (VIP) identified the most influential biomarkers, i.e., PPL, POL, PPO, OOL, ICT, PLL, FHT, POO and OLL. The Z-test result (p > 0.05) indicated that the PLSR could determine the lard adulteration percentage in fish feed.


Introduction
The sh feed is formulated to effectively supply nutritional requirements and subsequently maintain physiological functions such as effective growth, reproduction and immune systems. For this purpose, a wide range of additives in the formulation of sh feed has been introduced. Besides antioxidants, enzymes and feed preservatives, the application of phytogenic feed additives (PFA) from roots, leaves and fruits in solid and liquid forms has received interest within the aquaculture industry (Encarnação, 2015). Among the PFAs, essential oils such as palm and seed oils are often used in the sh formulation due to nutritional quality, cheap source, generally recognized as safe and able to improve the feed digestibility and intake (Tyapkova et al., 2016). On account of these properties, there is an increasing interest to substitute the essential oil from PFA with lard. Although the addition of lard into sh feed has rendered a greater sh growth than other essential oils (Glencross, 2015), this substitution has raised concern among consumers, especially the vegetarians (Mutalib et al., 2015), Jews (Mukherjee, 2014) and Various testing methods have been utilized to address the issue of lard adulteration in products. Most testing use polymerase-chain-reaction (PCR) method, competitive indirect enzyme-linked immunosorbent assay (ELISA) and liquid chromatography methods, e.g. liquid chromatography time of ight mass spectrometer (LC-QTOF/MS) and liquid chromatography-mass spectrometer (LC/MS) to identify the presence of pork-originated adulterants. However, these methods are protein-targeted, complex, and prone to contamination (Yap & Gam, 2019). Additionally, these methods are costly for maintenance in testing laboratories (Abbas et al., 2018). Due to these disadvantages, affordable analytical methods using highperformance liquid chromatography refractive index detector (HPLC-RI) and differential scanning calorimeter (DSC) (Naquiah et al., 2017) have successfully identi ed lard adulteration by analysing the triacylglycerols (TAGs) and thermal properties (TPs) of the samples (Azir et al., 2017;Noorzyanna et al., 2017;Yanty et al., 2017). Nonetheless, these identi cation methods were inadequate for authentication purposes since the lard adulteration was identi ed using sample pro ling only. Moreover, these methods were applied to food products and have never been tested in sh feed. Hence, our study developed the authentication method for that purpose.
The comparison of sample pro les was insu cient for the authentication of sh feed sources. Although the HPLC-RI may render lower TAG detection sensitivity, it could not differentiate the sh feed source via comparison of the individual TAG in the sample with the TAG standard since the TAG of plant and animal origins possess almost similar characteristics distribution of TAG. Likewise, the TP of plant and animal oils have a similar pro le pattern. These claims were evident from comparing TAG and TP pro les between palm oil and lard (Noorzyanna et al., 2017), although a thorough comparison was made via ANOVA. Naquiah et al. (2017) extended the ANOVA of the TP to multivariate data analysis (MDA) using principal component analysis (PCA); however, the research did not identify the potential biomarkers to discriminate the lard, authenticate the sample source and determine the composition of the lard adulteration in the sample.
The MDA, including PCA, discriminant analysis (DA) and partial least square regression (PLSR), requires pre-requisite analyses. These analyses entail outlier treatment (Bower, 2013a), dataset transformation for normal distribution (Granato, de Araújo Calado, et al., 2014) and test of dataset adequacy (Williams & Brown, 2012), which were absent in previous PCA of feed. Without the ful lment of these pre-requisite analyses, the MDA may lead to erroneous results and interpretation. Previous researches have neglected the qualitative authentication of sh feed analysis via DA and determination of lard adulteration level via PLSR. The MDA should test the training, validation and testing datasets of the sh feed before con rming the authentication ability of the analytical method and MDA models. Hence, this study outlined a guideline to explore the sh feed dataset, identify the signi cant biomarkers, and determine the level of lard adulteration in the sh feed. Subsequently, this study anticipated an adaptation of this guideline by the certi cation or regulatory bodies in developing sh feed guidelines or standards for research and testing laboratories. The ingredient percentages in the sh feed were calculated using WinFeed 2.8 software (Cambridge, UK).

Preparation of sh feed
The ingredients of each diet formulation were weighed as determined by Winfeed 2.8 Software and mixed with 60 mL of pre-heated distilled water at 70ºC using a food mixer (Giselle, Malaysia) for 3 min. The mixture was transferred into a dough-making machine (Hengfeng, China) and mixed at 75 rpm for 5 min.
The dough was kept at 25°C for 2 h to initiate the fermentation process, extruded at 7 mm diameter using a manual meat mincer and cut uniformly to produce consistent pellet length. The pellets were steamed (Little Homes, Malaysia) for 5 min and dried at 60°C for 3 h in a pre-heated electric oven (Memmert GmbH, Germany). The dried pellets were then kept in a desiccator for 15 min and stored in a dry and closed container.

Extraction of sh feed
Before extraction, the dried pellets were ground for 5 min in a 240 W electrical blender (Panasonic MX-337, Malaysia). The oil from 2 g dried sh feed was extracted with 150 mL petroleum ether at 60 • C for 8 h in the Soxhlet apparatus (Khallouki et al., 2018). The extract was transferred into a pre-weighed at bottom ask, concentrated using a rotary vacuum evaporator (Eyela N-1001, Japan) at 40 • C and frozen at -20 • C in a glass container.

Thermal analysis of sh feed samples
Thermal analysis was carried out using a differential scanning calorimeter (DSC) of 823 model equipped with a station of thermal data analysis (Mettler Toledo, Switzerland). An approximate of 4-8 mg the extract was placed in an aluminium pan, hermetically sealed and analysed according to continual temperature setting: heated at 70°C for 1 min, cooled at 5°C/min to -70°C, held at -70°C for 1 min and heated at 5°C/min to 70°C. The cooling procedure was executed with nitrogen (99.999% purity) at ~20 mL/min. An empty, hermetically sealed aluminium pan was used as the reference. Thermal properties (TPs) of cooling and heating activities, including cooling and heating temperatures of the sh feed, were recorded. Thermograms of the sh feeds were compared to identify the numbers of cooling and heating temperatures.

Dataset pre-processing
The percentage of peak area of TAG and TP was imported to the dataset table in XLSTAT 2017 software (Addinsoft, Paris, France). An about 45 sh feeds entailing 19 TAGs, and 16 TPs was pre-processed to reduce the variation of the TAGs and TPs in the dataset. The dataset was analysed via box-and-whisker plot (BWP) to treat outlier, ANOVA test, dataset transformation, KMO test and PCA.

Outlier treatment
Prior to the ANOVA test and PCA, the individual TAG and TP were subjected to outlier treatment using the BWP method from a standardized dataset. The con dence interval of the BWP was set at 95%. The skewness of the BWP was examined to con rm the need for dataset transformation. The dataset, which showed different patterns within the individual TAG and TP, was discriminated and shown in the BWP as an outlier. Outlier value exceeded three times the box's height was signed with a dot, star or asterisk (Saiful et al., 2019) and replaced with the variable's mean value.

Analysis of variance
Results were expressed as mean and standard deviation of triplicate analyses for the distribution of TAGs and TPs of the sh feed. The ANOVA test of Tukey's test was applied to determine the signi cant difference between means of the TAGs and TPs at a 95% con dence level (p < 0.05).

Dataset transformation
To ensure dataset following normal distribution prior to the PCA, the dataset normality was tested using Shapiro-Wilk (SWT), Anderson-Darling (ADT), and Lilliefors (LT) tests at α of 0.05. The dataset was transformed using standardize n-1, standardize (n), centre, standard deviation -1 (n-1), standard deviation -1 (n), rescale from 0 to 1, rescale from 0 to 100, and Pareto transformation methods. The undetected thermal properties were acknowledged as missing data and subjected to removal before the dataset transformation. The transformation of each TAG and TP was employed to ensure the transformed dataset remained closer to the original dataset . The normal distribution of the transformed dataset was tested, and the best transformation method and normality test were selected from the result.

Dataset exploratory by principal component analysis
The PCA of Pearson correlation was employed to examine the dataset pattern, explore the TAGs and TPs contributions to the sh feeds, nd and explain the TAGs and TPs correlation and reduce the dataset signi cantly (p < 0.05) into smaller sets of new independent variables, which were called as principal components (PCs) via the following formulae: Where S is the component score, f is the FL, v is the concentration of TAG and TP, p is the PC number, q is the sample number, and n is the total number of TAG and TP.
In this study, two PCAs were executed. The rst PCA was executed, and cumulative variability (CV), eigenvalue (EV) and KMO and FL values were evaluated at a signi cant level (α) of 0.05.
Using TAGs and TPs with strong FL, the second PCA was executed to produce a new TAG and TP plot and a biplot of TAG, TP, and sh feed. The cumulative variability, eigenvalue and KMO and FL value were evaluated and compared to the rst PCA. The FL and correlation of TAGs and TPs were assessed, and the apportionment of TAGs and TPs to the sh feed clusters was examined. From the second PCA, the signi cant TAGs and TPs contributed to the sh feed clusters were proposed as the biomarkers (Idris et al., 2021).

Authentication of sh feed source by discriminant analysis
In this study, DA was employed to authenticate sh feed. The same sh feed dataset was acknowledged as training and validation datasets while testing dataset consisting of 25 known sh feeds were prepared for authentication purposes.
A new column labelled as 'cluster' was added to the dataset, and the sh feeds were assigned as 'palm oil' for 0:100 L-PO, 'lard + palm oil' for 25:75 L-PO, 50-50 LPO and 75:25 L-PO and 'lard' for 100:0 LPO clusters. The DA model was executed at set α of 0.05. The function of the DA model is as follows: Where a was the number of sh feed cluster (C), k was the constant for each cluster, n is the number of TAG, and TP (T) used to classify the training dataset into the cluster, and w is the weight coe cient which was assigned by the DA to the selected T. The DA model was developed for sh feed, the dissimilarity of palm oil, L-PO and lard clusters was explored, and 25 known sh feeds were authenticated. The signi cant TAGs and TPs that caused the dissimilarity between the clusters were identi ed (Sharin et al., 2021).

Determination of lard adulteration percentage by partial least square regression
The PLSR is employed to develop the PLSR model and determine the L-PO composition in sh feed. The training, validation and training datasets were utilized in this study. A new column labelled as 'lard adulteration percentage' was to the dataset, and the sh feeds were assigned as '0%', '25%, '50%', '75 %' and '100%' lards according to the percentage of adulterated lard in the sh feeds. The PLSR model was executed at set α of 0.05. The function of the DA model is as follows: where a mathematical relation was created between the matrix of the sh feed training dataset (X) and the regression coe cient vector (V). A principal component of PLSR was also developed in this stage. Then, the validation dataset was employed to optimize this mathematical relation to set up calibration.  Table 1 shows the detected outliers from the BWP of the sh feed. The sh feed with 0:100, 25:75, 50:50, 75:25, and 100:0 L-PO mixtures presents 1, 3, 10, 2, and 5 outliers, respectively whereby, the sh feed with 50:50 LPO mixture had the highest number of outliers. The outlier treatment by replacing the outlier with the mean of each parameter (Berg et al., 2006) decreased the PCA's dataset distortion. However, previous research had never performed the outlier treatment before the PCA, which analysed TAGs and thermal properties (Golijan et al., 2019). Post outlier treatment of the BWP, the new dataset was also subjected to dataset normalization prior to the PCA.

Triacylglycerol of sh feed
The TAG percentage in the sh feeds is presented in Table  For the sh feed with 100% palm oil, the ranking of TAG percentage as follows: PPP < SOS < SSS < MMM < LLL < MPL < PSO < SOO < LLLn < OLL < SPO < PPS < OOL < PLL < OOO < PPL < POL < PPO < POO. In contrast, the sh feed of 100% lard exhibited different ranking of TAG percentage as follows: SPO < SSS < MMM < PPP < MPL < SOS < LLL < SOO < PPL < LLLn < PPS < OOO < OLL < PSO < OOL < PLL < PPO < POO < POL. It was evident that the ranking of TAGs in 25:75 L-PO, 50:50 L-PO, 75:25 L-PO mixtures were different from each other. The different rankings indicated the addition of lard into sh feed containing palm oil affected the distribution of the TAGs.

Thermal properties of sh feed
The thermal properties of the sh feeds are presented in Table 1. The thermal analysis of the sh feeds yielded one starting cooling temperature (4.68 ± 0.47 • C to 13.53 ± 0.65 • C) and one ending cooling temperature (-29.92 ± 5.51 • C to -49.13 ± 7.55 • C). Yanty, Marikkar, & Abdulkarim (2014) analysed palm oil and recorded cooling temperatures at 20.1 • C and 3.05 • C as compared to 0.45 • C, -3.43 • C, and -19.28 • C in our study due to the former analysed palm oil only while our study analysed the sh feed extract containing the palm oil. The sh feed containing 100:0 L-PO showed the cooling temperature at 10 Table 1, all sh feeds recorded six cooling temperatures while sh feed with 0:100 L-PO recorded three cooling temperatures. Although the absence of three cooling temperatures may distinguish the sh feed containing 0:100 L-PO from other sh feed, no signi cant difference of temperature (p < 0.05) was observed at cooling starting temperature, cooling temperature, and cooling ending temperature in all sh feeds.
The thermal analysis of the sh feeds (Table 1)  This ANOVA result considered only the signi cant difference of the TAGs and thermal properties and was subjected to further improvement by applying PCA for the multivariate dataset (Jasour et al., 2018). Although negligible research has adopted the application of PCA to explore the dataset entailing thermal properties, Green et al. (2020) recommended employing TAGs and PCA to investigate the source of edible oils. This approach is practical to uphold the sh feed integrity as claimed by manufacturers and bring con dence to consumers.

Dataset transformation
This study investigated issues of (1) which dataset transformation was suitable for TAGs and TPs analysis in sh feed and (2) which normality test was the best to examine the dataset normality.
The most common dataset transformation was the standardize (n-1) followed by other dataset transformations deemed suitable according to sample type. Supplementary data 1, 2 and 3 show the pvalue of normality test for SWT, ADT and LT, respectively. After dataset transformations, not all TAGs and TPs of the transformed dataset demonstrated normal distribution.
Before dataset transformation, the normality test of SWT exhibited 13 TAGs and 15 TPs that followed a normal distribution (Supplementary 1). Post transformation via standardize (n-1), OOL showed normal distribution with 0.0491 p-value. The SWT identi ed 14 TAGs and 15 TPs that followed normal distribution after the dataset transformations. The ADT exhibited 10 TAGs, and 14 TPs followed a normal distribution before the dataset transformation (Supplementary 2). The PPL and PPO had followed the normal distribution after standardize (n-1) transformation. Hence, the ADT resulted in 16 TAGs and 14 TPs that followed normal distribution after the dataset transformations. The LT showed 10 TAGs, and 14 TPs followed the normal distribution before the dataset transformation (Supplementary 3). After the dataset underwent all transformations, no TAGs and TPs followed a normal distribution. From these ndings, the ADT was recommended as the best normality test to investigate dataset normality of TAGs and TPs in the sh feed, which was in agreement with the study of Razali, Wah, & Sciences (2011), who found that normality test of ADT was effective in low sample size (n < 10000). Besides, Bower (2013b) recommended the acceptance of the non-normal distribution of the dataset because the instrumental or continuous measurement of the sample was principally following the normal distribution.  (2011) did not transform the sh feed dataset, possibly due to the principle that the dataset distribution will never achieve normal distribution (Rodriguez, 2020). However, our study had proven that the standardize (n-1) transformation was suitable for the sh feed matrix. This study proposed testing the sh feed dataset with various transformations to con rm the most suitable transformation. Hence, 10 TAGs and 14 TPs were not transformed, and LLL, PPL and PPO were transformed using standardize (n-1). Although OLL, PLL, OOL, POL, OOO, POO, CT2 and HT4 remained with non-normal distribution after the transformations, these variables were also transformed using standardize (n-1) prior to PCA.

Dataset exploratory by principal component analysis
The exploratory of the dataset via the rst PCA demonstrated PC1 and PC2 with eigenvalue (EV) > 1 (Falcó et al., 2019), which explained 57.766% cumulative variability (CV) of the dataset ( Table 2). The EV and percentage variability (PV) re ect the size and signi cant PC (p < 0.05), whereby PC1 has a larger EV than PC2. The EV information supported or our result that the EVs decreased as the PC number increased, i.e. PC1 (EV = 13.318, PV = 38.050) > PC2 (EV = 6.900, PV = 19.715). Although there is no cut-off value of EV or PV, our study adopts the suggested EV > 1 as a principal guideline for feed composition study (Falcó et al., 2019). The KMO test calculated the value of 0.513 for the rst PCA. Although no sh feed study has determined the KMO value and provided the acceptance limit speci cally for the sh feed matrix, KMO > 0.5 indicated compliance with PCA's dataset adequacy (Kaiser, 1974 By executing the second PCA using TAGs and TPs with strong FL from the rst PCA, the second PCA demonstrated PC1 and PC2 with higher CV (90.625%) and KMO (0.811) than of the rst PCA (Table 2), while the TAGs and TPs with strong FL remained in the same criterion. This result con rmed that 10 TAGs and 3 TPs with strong FL were adequate to explain the 90.625% variation of the sh feed dataset. Our study also found the TAGs with strong FL were the ones with oleic acid and palmitic acid in their chemical structure. The ICT and CT2 were attributed to the amount of saturated and unsaturated TAGs, and FHT was associated with the melting of crystallized and polymorphic transitions of fat (Azir et al., 2017).
The positive and negative signs of the FL in the second PCA (Table 2)   LLLn, OOO and CT2 as the biomarkers to identify the sh feed source. Nonetheless, the PCA is an unsupervised MDA which is more suitable for dataset exploratory; hence this study employed these biomarkers to authenticate sh feed source via DA and determine the percentage of lard adulteration via PLSR.

Authentication of sh feed source by discriminant analysis
The DA was performed on the training and validation datasets to (1) develop a DA model for sh feed, (2) explore the dissimilarity of palm oil, L-PO and lard clusters, (3) authenticate 25 known sh feed in the testing dataset and (4) identify the signi cant TAGs and TPs that caused the dissimilarity between the clusters. Table 3 shows the classi cation matrix of training, validation, and testing datasets by the DA model. The p-value of Wilks' lambda for the DA model (p < 0.0001) indicated that the three clusters were signi cantly different from each other. This result was con rmed by the calculated p-value of Fisher distance (p < 0.0001) between the two clusters. The training and validation datasets had 100% and 84.44% correct classi cation values, respectively. Although the correct classi cation of the latter was lower than the former, 100% correct classi cation was recorded for the testing dataset, thus proved the DA model was reliable to authenticate 25 known sh feeds (Figure 2). The DA model had selected PLL, OOL, POL, PPL, PSO, ICT and FHT as the signi cant biomarkers (p < 0.05) from the 10 TAGs and 3 TPs proposed by the PCA.

Determination of lard adulteration percentage by partial least square regression
The PLSR result in Table 4 Table 4 also exhibited the determination of lard adulteration percentage for ve clusters of sh feed.
Based on the mean and standard deviation, the determined value for each cluster was within the actual value. The determined value also fell between the 95% lower and upper bounds of the determined value.
Nonetheless, the relative error of the determined value decreased as the lard adulteration percentage increased, signifying the PSLR model was more sensitive at a higher percentage of lard adulteration. Since the Z-test value showed p-value > 0.05, the null hypothesis (Ho) was accepted indicated the percentage means of the determined and actual lard adulteration were not signi cantly different. This result proved that the PLSR model could determine the percentage of lard adulteration in the sh feed.

Conclusion
The demand to produce sh feed for aquaculture elds has shown exponential growth, which requires the testing laboratory's authentication as a feed forensic tool. The TAG and TP testing analyses by HPLC-RID and DSC incorporated with PCA, DA and PLSR are imperative to address the false claim and ensure feed integrity from the manufactures. The analysis undergoes procedures of sh feed extraction, TAG and TP analyses, dataset pre-processing, exploration of the dataset via PCA, authentication of the sh feed by DA and determination of lard adulteration percentage in sh feed by PLSR a guideline to avoid false-positive and negative results. However, it is recommended that the TAG and TP analyses undergo method validation and veri cation (MVV) to determine the limit of detection (LOD) and quanti cation (LOQ) of the method before the dataset analysis. Also, this manuscript explained the PCA, DA and PLSR, which were the frequently utilized MDA for authentication, while other MDA such as multiple linear regression (MLR) and principal components regression (PCR) were not discussed. Since limited research focuses on sh feed testing, this manuscript may guide the researchers and testing laboratories to extend their scope of analysis and suggest applying HPLC-RID, DSC and MDA as an option for the sh feed testing. The certi cation or regulatory bodies at the governmental level and testing laboratories could adapt this guideline to develop a standard of TAG and TP analyses for authentication of sh feed sources.  Tables   Table 1: The detected outliers and mean of triacylglycerols and thermal properties of fish   Note: 1 LLLn = dilinoleoyl-3-linolenileoyl glycerol, LLL = trilinoleoyl glycerol, MMM = trimyristoyl glycerol, OLL = dilinoleoyl-1-oleoyl glycerol, PLL = dilinoleoyl-1-palmitoyl glycerol, MPL = myristoyl palmitoyl-linoleoyl glycerol, OOL = dioleoyl-3-linoleoyl glycerol, POL = palmitoyl-oleoyl-linoleoyl glycerol, PPL = dipalmitoyl-1-linoleoyl glycerol, OOO = trioleoyl glycerol, POO = dioleoyl-1-palmitoyl glycerol, PPO = dipalmitoyl-3-oleoyl glycerol, PPP = tripalmitoyl glycerol, SOO = dioleoyl-1-stearoyl glycerol, PSO = palmitoyl-stearoyl-oleoyl glycerol, PPS = dipalmitoyl-3-stearoyl glycerol, SSS = tristearoyl glycerol, SOS = 1,3-distearoyl-2-oleoyl and SPO = 1-stearoyl-2-palmitoyl-3-oleoylracglycerol. 2 ICT = initial cooling temperature, CT = cooling temperature, FCT = final cooling temperature, IHT = initial heating temperature, HT = heating temperature and FHT = final heating temperature. 3a FL ≥ |0.750| = strong factor loading, b |0.500| < FL < |0.749| = moderate factor loading and c FL ≤ |0.499| = weak factor loading in the principal component 4 Bold factor loading indicated strong factor loading in the principal component.