Phenotypic Characterization of Hair and Honamli Goats by Using Classication Trees Algorithms and Multivariate Adaptive Regression Splines (Mars)

In order to meet the food demand of the increasing world population, it is very important to dene the animal breeds and species raised in tropical and subtropical regions and to organize breeding programs for this. Discrimination animal breeds by morphological classication are a widely used method for a century. Although Honamli and Hair goats are very similar to each other morphologically, they can be subjectively distinguished by experienced breeders with some distinctive morphological markers. In the current study, certain body characteristics of Hair goats, which have a large portion of the population in Turkey, and Honamli goat, which has recently been registered as a new breed were used. Phenotypic characterization of these breeds has been made using data mining methods such as Classication and regression tree (CART), chi-square automatic interaction detector (CHAID), Exhaustive CHAID, Quick Unbiased, Ecient Statistical Tree, (QUEST), and multivariate adaptive regression splines (MARS) algorithms. In other words, the current study is the rst data mining algorithms used for phenotypic characterization in Hair and Honamli goat breeds. Goats’ morphological characteristics such as live weight (LW), withers height (WH), back height (BH), rump height (RH), chest Depth (CD), body length (BL), chest girth (CG), leg girth (LG), head length (HL), fore head (FH), ear length (EL), and tail length (TL), used in diagnosis of discrimination on breeds, were used as a binary response variable in Honamli and Hair breeds. Here, the independent variables used in data mining algorithms are the morphological characteristics of goats. CHAID, Exhaustive CHAID, CART, QUEST, and MARS were used as data mining algorithms to make an accurate decision in detecting effective morphological traits in breed discrimination. The success of the CHAID, Exhaustive CHAID, CART, QUEST and MARS algorithms in breed discrimination is 87.80%, 85.80%, 87.80%, 77.00%, and 88.51%, respectively, while the area under the ROC curve is 0.880, 0.853, 0.868, 0.784 and 0.942, respectively. As a result, using data mining methods for some body measurements of Honamli and Hair goats, whose morphological distinction is not exactly accurate, phenotype characterization separation was performed with high success in MARS and CHAID algorithms compared with the other methods. The outputs of this study can be used for breeding material by enabling pure Honamli goat breeding. Also, data mining algorithms can be included in gene resource conservation programs.


Introduction
Identi cation and classi cation of breeds within a species according to phenotypic characteristics plays a key role in the basis of breeding and conservation program strategies. In this respect, researches on these strategies have been constantly carried out in tropical and subtropical regions of the world (Nsoso et al. are applied, there is no study in which racial discrimination is made by using morphological features in goats.
In this study, it was aimed to determine the phenotypic characterization of Hair and Honamli goats by using data mining algorithms and some of the morphological characteristics belonging to Honamli and Hair goats such as LW, WH, BH, RH, CD, BL, CG, LG, HL, FH, EL, and TL.

Animals
The animal material of the study consists of 65 Hair goats (45 females, 20 males) and 83 Honamli goats (73 females, 10 males) of different ages (1, 2, 3, 4, 5 and 6 years and over), which are grown extensively in a private farm in Çalca district of Kütahya province. Honamli pedigree goats were brought to the enterprise in 2015 from an enterprise which is a member of the sheep and goat breeding association in Antalya. Hair goats in the enterprise were bred in previous years. Pedigree records of the breeds were checked by the business owner through the TürkVet system. In the busy winter season or in adverse weather conditions, goats have been fed in the pen. Concentrated feed is given for 1 month before the breeding season for ushing, and in addition to pasture, straw, alfalfa, and fescue grass are given as roughage sources in addition to pasture.

Measurement of Morphological Characteristics of Breeds
A special scale designed for weighing small ruminants is used in determining the live weight of animals. All body characteristics of the goats were measured due to the suggestions of (Ertugrul, 1996), and LW was taken with 0.1 precision. All body measurements were made after the animals had adapted to the environment for a certain period of time on a at platform and the stress factors were minimized. WH, BH, RH, BL, and CD, which are the height measurements of goats, were taken using a measuring stick and body circumference measurements (CG, and LG) were taken using a measuring tape. HL, FH, EL, and TL values were measured using calipers.

Statistical analysis
The structure created by using all arguments and dividing the data into subgroups is termed a classi cation tree. The root node, which does not contain any fragmentation and contains only the dependent variable, is at the top of the classi cation trees. First, this root node is divided into two or more parts. While these separated parts are called parent branches, similarly, breaking up of parent branches is called daughter node or subset (Eyduran et al. 2016). When fragmentation is over in the daughter nodes, and in the node where there is no branching anymore, the terminal becomes node (Orucoglu, 2011). By testing the independent variables taken into the model, the cut values of the explanatory variable are determined in a way to provide the speci ed category in the new node to be formed (Aksahan and Keskin, 2015).
Classi cation performances of CART (Breiman et al., 1984), CHAID (Kass, 1980), Exhaustive CHAID (Biggs et al., 1991), QUEST (Loh and Shih, 1997) and MARS (Friedman, 1991) data mining algorithms were utilized comparatively in terms of accuracy, sensitivity, speci city and area under ROC curve. While CHAID and Exhaustive CHAID classi cation tree algorithms are divided into nodes in the form of multiway (Akin et al., 2018), CART and QUEST algorithms are divided according to the binary node structure rule (Kovalchuk et al., 2017). MARS algorithm, which is a modi ed version of the CART algorithm, makes better predictions than binary logistic regression thanks to the hinges function in its structure. In the classi cation trees, the maximum tree depth was used as CHAID (3), Exhaustive CHAID (3), CART (5), and QUEST (3), respectively.
After the whole data set (148 records) was randomly divided into 10 parts, nine parts of the training set of the models were created, while the model was estimated 5 times in the remaining part. In the formation of classi cation trees, the minimum number of parent and child nodes was taken as 10 and 5. Accuracy is the proportion at which a classi cation algorithm correctly separates Honamli and Hair goat. Sensitivity is the proportion at which the algorithm correctly classi es Honamli goats, while speci city is the proportion at which the algorithm correctly classi es Hair goats. The expressions T + , T -, F + and Fused in the accuracy, sensitivity and speci city equations represent numbers of true positive, true negative, false positive, and false negative, respectively. The formula used below to determine AUC (AUCse) was developed by (Hanley and McNei, 1982).

Results
Categorical variables belonging to Honamli and Hair goats in the study are presented in Table 2. Descriptive statistics of continuous variables obtained from Honamli and Hair goats are given in Table 3. Honamli females had higher values compared to males because females are older ( Table 2). In other words, young billy goats are preferred in the herd to shorten the time between generations.  Table 3 Descriptive statistics on live weight and some body measurements in Honamli and Hair goats of different    CHAID was chosen as the best classi er among the classi cation trees for the distinction of Honamli Hair goats (Table 4). In the root node of the CHAID diagram, 83 (56.10 %) of the 148 goats were classi ed as Honamli 65 (43.90 %) and as Hair (Figure 2). When the CHAID diagram is examined, it was determined that the rst order effective independent variable on breed discrimination is RH (Adj. P-value = 0.000, χ2 = 59.332), second order is Age (Adj. P-value = 0.014, χ2 = 9.981), and BH (Adj. P-value = 0.036, χ2 = 6.313), and third-order independent variables were LG (Adj. P-value = 0.045, χ2 = 13.362) and CD (Adj. P-value = 0.003, χ2 = 12.577). Branches generated by independent variables in the entire tree structure are statistically signi cant (P<0.05).
All goats (Node 0) considered in the study were divided into 3 sub-groups (nodes) in terms of RH variable. In

Discussion
The most successful data mining algorithms used in the phenotypic characterization of Honamli and Hair goats are MARS and CHAID. While the MARS algorithm uses "LW", "BH", "CD", "CG" "Sex", "Age", and "HL" properties as independent variables in breed discrimination, CHAID algorithm uses "RH", "Age", "BH", " LG", and "CD". Essentially the same key variables can be used to describe closely related animal species (FAO, 2012).  (Herrera, et al. 1996). Martínez et al. (2014) reported that the chest depth and rump length characteristics of the Murciano-Granadina and Malagueña dairy goat breeds raised in Spain are important in breed discrimination. Since the goat breeds in these literatures are in different environmental conditions and their genetic structure is different, they are not consistent with our study results. Dossa et al. (2007) reported that the rate of correct assignment of goats in Benin to the appropriate grazing system is 76.60%. In Jordan, two indigenous goat breeds were separated by simple discriminant analysis by using the morphological characteristics, and it was stated that the Desert and crossbred goats were assigned to the correct class as 65.60% and 79.80%, respectively (Zaitoun, 2005 although it was determined that the bulbus and scapus pili diameters of the Honamli goat hairs were larger than the bristle goat hairs, they could not be used in racial discrimination due to the heterogeneous structure of the hair samples (Orhan et al. 2018). In the present study, even if the data set has a very heterogeneous structure, it enables successful segregation with data mining algorithms.

Conclusion
Breeding of hair goats has become essential due to the economic and good adaptation of goats in the environmental conditions, where they are raised. In this context, Honamli goats, which have high adaptability and yield capability, should be crossbreed with Hair goats after pure breeding. As a rst step to pure breeding, it is necessary to distinguish between goat breeds correctly.
Data mining algorithms used in racial discrimination are among the most reliable (robust) methods.
Considering the successful performances of four different classi cation trees and MARS analysis in racial discrimination according to the current study results, it was determined that more accurate classi cation could be made with the CHAID and MARS methods. Especially in the breeding season of Hair goats, the preference of Honamli billy goats as breeding breeders due to their superior characteristics will provide success in crossbreeding. These crossbreeding studies could provide employment for breeders as well as obtaining higher productivity animals and will play a triggering role in increasing red meat production nationally.

Page 14/20
The data mining algorithms evaluated in the present study might shed a light on further studies. In addition, these algorithms can distinguish breeds or crossbred that cannot be distinguished between races by genetic proximity, and they should not be ignored especially in statistical models that can be applied in gene resources conservation programs. Figure 1 All and individual ROC curves of classifying algorithms for diagnosis test of breed discrimination Figure 2 CHAID classi cation tree diagram of the diagnosis test of breed discrimination