Application of Artificial Neural Networks to Predict Cotton Production: A Case Study in Diyarbakır Province, Turkey

Cultivators are always curious about the factors affecting yield in plant production. Determining these factors can provide information about the yield in the future. The reliability of information is dependent on a good prediction model. According to the operating process, artificial neural networks imitate the neural network in humans. The ability to make predictions for the current situation by combining the information people have gained from different experiences is designed in artificial neural networks. Therefore, in complex problems, it gives better results than artificial neural networks. In this study, we used an artificial neural network method to model the production of cotton. From a comprehensive datum collection spanning 73 farms in Diyarbakır, Turkey, the mean cotton production was 559.19 kg da -1 . There are four factors that are selected as pivotal inputs into this model. As a result, the ultimate ANN model is able to forshow cotton production, which is built on elements such as farm states (cotton area and irrigation periodicity), machinery usage and fertilizer consumption. At the end of the study, cotton yield was estimated with 84% accuracy.


INTRODUCTION
Cotton is one of the most prominent products in the agricultural sphere, industry and trade because of its very different and important utilization areas. In addition to the escalating world population, the increasing needs of human beings for consumption raise the importance of this versatile plant day by day. Growing interest in natural fibers and rising living standards in the world increases the demand for cotton plants (Anonym 2020 a). According to the data of the International Cotton Advisory Committee (ICAC), 32,825 million hectares of cotton were produced in the world in the 2018/19 production period. In this season, 37% of the 32,825 million hectares of cotton are cultivated in India. India is followed by the USA, China, Pakistan and Brazil in the width of the cultivation areas. As a result of the expansion of cotton acreage in African countries in recent years, despite its growth, Turkey has been ranked 11th in terms of world cotton cultivation area (Anonym 2020b). It is an important industrial plant that constitutes raw materials with fiber in the textile industry, oil obtained from its core in the vegetable oil industry, achaenium and pulp in the animal feed industry, linters in the paper, furniture and cellulose industries. Cotton is an important and strategic product that provides great benefits to our country's economy with this wide area of use, added value and employment opportunities. Due to these features, it has contributed to the development of both agriculture and industry of the regions and countries grown (Anonym 2020c). Areas where cotton production is intensive in Turkey; Aegean, Çukurova, Southeastern Anatolia Regions and Antalya. In the 2017/18 cotton season, 882 thousand tons of cotton fiber production was made in 502 thousand hectares, and approximately 1 million 571 thousand tons of cotton was consumed in Turkey. In the 2017/18 cotton production season, in return to 882 thousand tons of fiber cotton, 2,5 million tons of seed cotton was produced, and the fiber cotton yield was 1820 kg / ha. Şanlıurfa, Aydin, Bursa, Diyarbakır, Adana and Izmir are 6 provinces in Turkey that meet 88% of production, respectively. Şanlıurfa province alone meets 42% of all production. The share of the other 23 cotton-producing provinces in production is between 0.1% and 1.3% (Anonym 2020 d).
Since cotton is a selective plant in terms of climate characteristics, it can be grown in limited places in our country (Karademir et al. 2015). Cotton plants have great economic importance for humanity with their widespread and compulsory use and for producer countries with the added value and employment opportunities that they create. Increasing populations, growing interest in natural fibers and rising living standards raise the demand for cotton plants. Artificial neural networks have been used with traditional statistical methods in recent years thanks to their highly accurate prediction and classification capability, and now, they are even more popular than statistical methods. However, ANN modeling also has its own challenges. We can practice many potential ways to form and train networks. To put it in different ways, with a sufficiently large number of independent parameters, neural networks can be trained to tightly suit data that are likely to include noise. Hence, it is very important for us to understand how to optimize the structure of networks and learning to develop reliable models for decisionmaking that generalize well to data that do not fit any distribution (Samarasinghe, 2006, p. 5). In recent years, ANN models have been applied to predict the production of different plants using artificial neural networks. As an example, a high-accuracy production estimation model has been developed for basil (Rostami et al., 2017) and wheat (Safa et al., 2015). In addition, many studies have created ANN models that measure the effect of potato production on the environmental quality index , predict the maturity of cotton fibers (Farooq et al., 2018), and estimate the massiveness of energy ingested in production (Safa & Samarasinghe, 2011;Taki et al., 2012;Nabavi-Pelesaraei et al., 2016;Khoshroo et al., 2018;Taki et al., 2018). With the same success, ANN is being used in other fields of agriculture. For example, high accuracy models can be created in the estimation of the temperature inside greenhouses (Saltuk & Mikail, 2019), in beef cattle production (Bozkurt et al., 2015), in the estimation of milk yield (Mikail et al., 2013;2014; and in the diagnosis of diseases such as mastitis (Mammadova & Keskin, 2015). In this study, we set down the elements affecting cotton production in Diyarbakır to create a model with the help of these factors and to make a model with multiple regression analysis and artificial neural networks. Additionally, it aims to estimate yield.

MATERIALS AND METHODS
Diyarbakır Province is in the central part of the southeastern Anatolia region and at the northern end of Mesopotamia. It is surrounded by provinces such as Siirt and Muş from the east; Mardin from the south; Şanlıurfa, Adıyaman, and Malatya from the west; and Elazig and Bingöl from the north. Its area is 1.516.200,00 square kilometers, between 37.905199 and 40.231934 north latitudes and 40.37 and 41.20 east longitudes. It is surrounded by mountains that are not too high, and in the middle, there is concavity. It is covered with 37% mountains and 31% lowlands. The lowlands are firtile and suitable for agriculture. These fecund lands are irrigated by the Tigris River and its tributaries. The city was founded on a horizonal surface on the eastern edge of the broad basalt plateau stretching between Karacadağ and the Tigris, above the Tigris valley and at the top of the river curve. Its altitude is 650 meters above sea level. This altitude changes between 640 m and 660 m in some places (Anonym, 2020e)

Material
The core material of this particular study is the data derived from cotton-producing agricultural enterprises in the Bismil district of Diyarbakır. The data covered by the research were obtained through 73 face-to-face surveys in 2019. However, the secondary data of the research were used from the publications of various national and international institutions and organizations related to the subject.

Sampling Method
The Simple Random Sampling Method was used to determine the sampling frame and sample number in the study (Yamane, 1967) (1) n = Sample Size s = Standard Deviation t = "t value" Related to the Selected Confidence Limit N = Total Number of Units for Sampling Frame d = Acceptable margin of error (%) In the study, it was deemed appropriate to conduct a questionnaire on cotton production in 134 enterprises with a 95% confidence interval and 5% deviation from the mean. The Farmer Registration System of Diyarbakır Provincial Directorate of the Ministry of Agriculture and Forestry records were used as sample villages and enterprises. The primary data to be used in the study were the data obtained from the cotton producing enterprises in the sampling area. In determining the settlements within the scope of the survey application, the indicators of the ability to represent the research area were used, taking into account characteristics such as cotton cultivation area, number of enterprises, agricultural production potential, presence of tools -equipment, etc. The questionnaire forms used were arranged in accordance with the characteristics of the research field. Before the survey application, trial questionnaires were conducted in the selected settlements, after which we have given the final questionnaire forms considering the deficiencies observed. These oblique details and cotton production were surveyed to design the model to be able to prognosticate cotton production. To be able to implement this in the ANN model, it is essential to pick a finite number of relevant and significant variables, voiding any sort of propensity. For that reason, all of the information was researched cautiously. Seventeen original variables participated to be presented as the most likely input in the final model. Questionnaires containing unreliable information are not included in the process. For example, irrigation frequency entered as 0 is excluded. We used this information to draw the graphs and carry out statistical analysis operating with MINITAB Statistical Programme and MATLAB Software (Mathworks, 2009). In this context, 73 surveys conducted in Diyarbakır province Bismil central districts and villages were evaluated, and an estimation application was made.

Multiple Linear Regression
Repercussions Y and terms X1, …, Xp will be applied in this general multiple linear regression model as E(Y|X) = β0 + β1X1 +· · ·+βpXp (2) In this equation, X in E(Y|X) means that all the terms on the right side are being stipulated. Correspondingly, it happens while we are postulating on certain values for the predictors X1, …, Xp henceforth, to be called X. The unknown parameters are indicated as β's, and these particular parameters we need to account. In equation (2), the linear function of the parameters is stated, and for this reason, this is called linear regression. To obtain the simple regression problem, p should be equal to 1 so that X will have only one element. If p = 2, the mean function (2) complies with a plane in three dimensions, as shown in Fig. 1. However, if p > 2, the fitted mean function turns out to be a hyperplane, which is the universal meaning of a p dimensional plane in a (p + 1) dimensional space (Weisberg, 2005).

Artificial Neural Network
In Fig. 2, the ANN model with one input, one hidden neuron and one output was pictured in the input layer, hidden neuron layer and output layer, respectively. As a result, this ANN model has a1 -one input-hidden layer weight and b1 -one hidden-output layer weight. Here, X is the input and z is the network output. Bias input of + 1 has been noted in the hidden neuron and output neuron, respectively, with an associated weight of a0 and weight of b0. Smith also used this notation for the weights. The network feeds with the input, and the hidden neuron computes the weighted total of inputs (also covering bias) and delivers this weighted total of inputs via the logistic function to be able to build the hidden-neuron output, y. However, the output neuron feeds by the output neuron y as input via the associated connection link. By this link, it is weighted. After that, the weighted input is transferred by the activation function of the neuron. Network output appears as a result of the output of this neuron's transformation. The hidden neuron supersedes a very significant part of processing. The weighted input is shown below: After the first step, it follows by the idea that the weighted sum u should be passed through the logistic function by the hidden neuron. U is the logistic function's argument, and this function, without fail, is a standard function with y=0.5 at u=0. It will be more beneficial for the output y to be specified in terms of input x, so it will be possible to portray y as being laid out by x through u. Replacing u inside the logistic function, output y as the hidden neuron is also shown below:

Model Assessment
In this model, we used the evaluation r-Pearson correlation coefficient, R 2 -coefficient of determination, RMSE -root mean squared error (Spiegel et al., 2009).
a) The coefficient of correlation where yi is the observed value, ̂ is the predicted value, y is the arithmetic mean, and n is the total number of observations.

RESULTS AND DISCUSSION
In the present study, the maximum and minimum yields varied between 410 and 680 kg per da, and the average yield was 599.19 kg da -1 ( Table 1). The average cultivation area in the enterprises was found to be 87.64 da. The amount of fertilizer used in the enterprises was 87.84 kg per da, the tractor usage time was 1.8 hours and the irrigation frequency was approximately 6 times (Table 1). In the study, no linear relationship was found between yield and cultivation area and tractor operating times. It was determined that the amount of fertilizer and the frequency of irrigation positively affect the yield, although it is weak. That is, the Pearson correlation coefficient between fertilizer amount and yield was r = 0.29 (p <0.05), and the Pearson correlation coefficient between irrigation frequency and yield was calculated as r = 0.38 (p <0.01) ( Figure  3). Similar results were found in the study (Safa et al., 2015). In this study conducted in New Zealand, the Pearson correlation coefficient between N fertilizer consumption and wheat yield was determined as r = 0.43.

Multiple Linear Regression Model
To estimate cotton production, the following estimation model was created as a result of multiple linear regression analysis. The determination coefficient was used as the model evaluation criterion. The coefficient of determination of the model was found to be 2 = %18.3 = 346 + 0.039 × + 0.725 × + 33.09 × + 14.11 ×

Artificial Neural Network Model
According to this study, the ANN model consists of 20 neurons hidden layer, 4 neurons input layer and 1 output layer. (Figure 1). 4).

Figure 4. Generated ANN model
The Levenberg-Marquardt backpropagation learning method was used in the model. The data set is randomly divided into 3 shares: 70% training, 15% validation and 15% testing. The compatibility of the cotton production values estimated with each data set with the actual observation values was measured by the Pearson correlation coefficient (Figure 1). 5). The model gave the best result after the 18th iteration. As seen from Figure 3, the model gave the best result with training data. Here, the degree of the linear relationship between the estimated yield values obtained by using the training data and the actual yield values was found to be 0.98. This is followed by validation (0.77) and testing (0.76) accordingly. When all data were evaluated together, this coefficient was found to be 0.91. However, the RMSE statistic of the model was found to be 20.37. This value was calculated as 44.81 in the estimates obtained by MLR (Table 2). As a result of the comparison between the ANN model and the multiple linear regression models, it was seen that the multiple linear regression model cedes the correlation coefficient between the exact and predicted cotton yield of the ANN model, as the ANN model was much higher. In addition, the RMSE value was found to be much lower than the value obtained from the MLR estimation (Table 2). As shown in Figure 6, the ANN model can predict cotton yield with 84% accuracy. It was calculated as 91% in a similar study created for wheat production. The fact that there are weak relationships between the data leads to a very low accuracy as a result of the model made with MLP. In modeling with ANN, we can see that high accuracy predictions can be made even by using such data.
To increase the accuracy of the model in the future, it is envisaged to investigate other factors that affect efficiency and to create new models. In addition, we believe that more accurate results can be obtained by conducting research not only among cotton producers in Diyarbakır but also among all nearby producers.

CONCLUSION
In this study, we tried to predict cotton production with factors such as cotton area (da), fertilizer (kg da -1 ), tractor usage (h da -1 ) and irrigation frequency using the ANN model. The neural network model created with these factors can predict cotton production with an 84% accuracy.
The results of this study demonstrated the capacity of ANN models to better forecast cotton production by applying different factors than a multiple regression model. In the future, it will be possible to develop high-accuracy ANN models with the addition of environmental and breeding factors affecting production and forbye by using more business data. The results of this study are the start of developing suitable methods for estimating cotton production for the Diyarbakır region. Estimation models of different agricultural products can be created by using the ANN method.

Ethics approval and consent to participate
The core material of this particular study is the data derived from cotton-producing agricultural enterprises in the Bismil district of Diyarbakır. The data covered by the research were obtained through 73 face-to-face surveys in 2019. The farmer's consent was obtained in the surveys conducted.

Consent for publication
I am very pleased to submit our manuscript entitled "Application of Artificial Neural Networks to Predict Cotton Production: A Case Study in Diyarbakır Province, Turkey " by Nazire Mikail and Mehmet Fırat Baran, for consideration of publication in Journal of Cotton Research.
In consideration of the publication, I hereby warrant and undertake: 1. This article is an original work and no portion of the study has been published. 2. None of the author has any potential conflict of interest related to this manuscript. 3. Author has contributed to the work and author has agreed to submit the manuscript.
Your consideration of the manuscript would be greatly appreciated.

Availability of data and material
The core material of this particular study is the data derived from cotton-producing agricultural enterprises in the Bismil district of Diyarbakır. The data covered by the research were obtained through 73 face-to-face surveys in 2019. However, the secondary data of the research were used from the publications of various national and international institutions and organizations related to the subject

Conflict of interest
Nazire Mikail and Mehmet Fırat Baran, declare that they have no competing interests.

Funding
No financial support was received for this study.

Authors' contributions
The authors' contribution rate to this study is 50%.