This section presents the harvest of the selection process. Then, we present the information matrix along with results of each RQ.
1. Study selection
Based on the obtained results from the aforesaid scientific databases, 89 results were identified. We followed a filtering process to eliminate the articles that do not match our selection criteria. We first excluded 18 records because they were not indexed in well-known academic indexing services. Afterwards, we started scrutinizing the articles’ content by reading the title, abstract and keywords, after that we kept 50 articles for further investigation, in addition to 2 more articles that were found in references. By full reading of the 52 articles of this study, we excluded another 12 articles, as they are either not clearly relevant, or are out of our scope. Thus, we ended up with 40 papers for synthesis and analysis. Fig 2 illustrates a flow chart of the paper filtering process.
2. Literature Review Matrix (LRM)
One of the major problems that the farmers face in the beginning of every agricultural season is the selection of a suitable crop that would produce a better yield. This process is usually done based on the farmer’s experience, or with the help of an agronomist. Assisting this process was the objective of a huge number of papers throughout the past few years. Table 4.2 puts forward the LRM presenting in detail the studied articles that addressed this problem, based on different approaches and techniques.
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[32]
|
2014
|
In this paper, a Multi-Level Linguistic Fuzzy Decision Network (LFDN) method is applied to a real case dataset to decide the cultivate crop among four
crops.
|
Crops: Wheat, Corn, Rice, and Faba bean.
Features: Temperature, Water, Marketing and Soil.
|
Multi-Level LFDN.
|
The method can rank the actions/alternatives,
to select the appropriate alternative.
|
The dataset used is not described.
|
No performance metric was used.
|
[33]
|
2014
|
This work Integrated artificial neural networks (ANN) with geo-graphical information system (GIS) to assess the suitability of land to cultivate a selected crop.
|
Crops: Rice.
Features: Rainfall, Temperature, Elevation and Slope.
|
ANN, and Back- propagation.
|
High consistency in predicting crop suit- ability map.
|
Only 4 parameters were used to assess the suitability of the crop.
|
Mean squared error (MSE): 0.113
Accuracy: 83.43%
|
[34]
|
2015
|
This paper presents a technique named CS method to select a sequence of crops based on crop proprieties to improve net yields rate of crops to be planted over a season.
|
Crops: Seasonal crops, Whole year crops, Short time plantation crops and longtime plantation crops.
Features: Geography of a region, Weather conditions, Soil type and Soil texture.
|
CSM.
|
CSM method retrieves all possible crops that are to be sown at a given time stamp.
|
No evaluation metrics or experiments have been applied to precise the efficiency of the proposed system.
|
No performance metric was used.
|
[35]
|
2015
|
In this paper, a rule system is developed to help farmers make decisions about the choice of rice varieties using the crop and the properties.
|
Crops: 118 rice varieties.
Features: 7 Features.
|
Rule system.
|
The set of production rules is computed with KB and farmer’s land profile to infer suitable rice varieties.
|
The evaluation is very restricted (only 50 queries)
|
Accuracy: 83.4%
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[36]
|
2016
|
In this article a hybrid soft decision model has been developed to take decisions on agriculture crop, that can be cultivated in each experimental land.
|
Crops: paddy,
groundnut, sugar- cane, cumbu and ragi.
Features: Twenty- seven input criteria, namely soil, water, season, input (6 sub criteria), support, facilities, and risk.
|
Shannon’s entropy method and VIKOR method.
|
The model used deals with incomplete /missing data and inconsistency problems.
|
The model is trained
(150) and tested (25) in a small dataset.
|
Accuracy: 95.2%
Precision: 88.66%
|
[37]
|
2016
|
The proposed system is designed to predict the best suit- able crops for a given farm, and to suggest farming strategies, such as: mixed cropping, spacing, irrigation, seed treatment, etc. along with fertilizers and pesticides.
|
Crops: 44 crops that have been considered.
Features: Crop name, suitable rain- fall, temperature, cost, soil, and pH.
|
ANN and fuzzy logic (FL).
|
The extraction of crop growth data using FL.
|
More agricultural parameters can be identified to be included in the system.
|
Precision: 34% of crops had a value from 0 to 0.2 and 30% from 0.8 to 1.
Recall: 39% of crops had a value from 0 to 0.2 and 40% from 0.8 to 1.
|
[38]
|
2016
|
The authors applied the majority voting technique using Random tree, CHAID, K-nearest neighbors (KNN) and Naïïve Bayes as base learners for CR.
|
Crops: Millet, groundnut, pulses, cotton, vegetables, banana, paddy, sorghum, sugarcane, coriander.
Features: Depth, Texture, pH, Soil Color, Permeability, Drainage, Water
holding and Erosion.
|
Ensemble, Naive Bayes, Random tree, CHAID and KNN.
|
Large number of soil attributes are used for the prediction
|
Fertilization data like NPK values present in soil are not used.
|
Accuracy: 88%
|
[39]
|
2017
|
Proposed a system that can detect the user’s location then recommend top-k crops based on the seasonal information and crop production rate (CPR) of each crop of similar farms.
|
Crops: Not mentioned.
Features: Crop
Growing period database, Thermal zone database, Physiographic database, Seasonal crop database and CPR database.
|
Pearson correlation similarity (PCS).
|
The developed system can recommend appropriate crops in a satisfactory way.
|
The model doesn’t take into consideration the existing nutrient in the farms soil.
|
Precision: 72%
Recall: 65%
|
[40]
|
2017
|
In this article, an at- tempt is developed to predict crop yield and price that a farmer can obtain from his land, by analyzing patterns in past data.
|
Crops: Not mentioned.
Features: Crop areas, types of crop cultivated, nature of the soil, yields and the overall crops consumed.
|
Non-linear regression.
|
The developed system uses the demand as input.
|
The recommendation model is not tested or evaluated on a dataset.
|
No performance metric was used.
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[41]
|
2017
|
In this study, a FL- based expert system is proposed to auto- mate the CS for farmers based on parameters, such as, the climatic and soil conditions.
|
Crops: 20 crops.
Features: 23 features.
|
Fuzzy based expert system.
|
The study uses an important number of features to select the suitable crop.
|
The proposed system is extremely customizable instead of a more ad hoc system.
|
No performance metric was used.
|
[42]
|
2017
|
In this paper, a decision-making tool is developed for selecting the suitable crop that can be cultivated in each agricultural land.
|
Crops: Paddy, groundnut, and sugar- cane.
Features:26 input variables were classified into six main variables, namely soil, water, season, input, support, and infrastructure.
|
Decision matrix, Dominance-based rough set approach and Johnson’s classifier.
|
The validation results showed that the developed tool has a sufficient predictive power to help the farmers to select suitable crop.
|
The study is only based on one metric to evaluate the model.
|
Accuracy: 92%
|
[43]
|
2017
|
This paper develops a fuzzy based agricultural decision support system which helps farmers to make wise decisions regarding
CS.
|
Crops: Not mentioned.
Features:15 parameters.
|
Mamdani Fuzzy Inference System.
|
The system is deployed at many places and results are found to be accurate.
|
No empirical study was conducted to assess the quality of the model.
|
No performance metric used.
|
[44]
|
2017
|
This paper proposed two mathematical formulations, the first one for the determination of crop-mix that maximizes the farmer’s expected profit, and the second model that maximizes the average expected profit under a predefined quantile of worst realization.
|
Crops: Corn, Wheat, Soy, Barley.
Features: the land available to grow crops, the sequence of operations required for each crop, the corresponding time windows, the avail- ability of tools and tractors, their operating costs and the working speeds.
|
Natural Integer Programming and Maximization of the Conditional Value-at-Risk (CVaR).
|
The proposed model significantly increments the worst outcomes with respect to the farmer’s solution.
|
Only one farm was used for testing, and the model could be enriched by incorporating explicit decisions about other resources.
|
The model’s expected profit is higher than the farmers.
|
[45]
|
2018
|
This paper Suggested using deep neural network for agricultural CS and yield prediction.
|
Crops: Aus rice, Aman rice, Boro rice, Jute, Wheat, and Potato.
Features: 46 parameters
|
DNN, Logistic Regression, Support Vector Machine (SVM) and Random Forest (RF).
|
The proposed model has a relatively high accuracy.
|
Lack of details about the parameters of the model and the complete list of input parameters.
|
Accuracy≥ 90%
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[46]
|
2018
|
The article proposes a new system for CR based on an ensembling technique.
|
Crops: Cotton, Sugarcane, Rice, Wheat. Features: Soil Type, pH value of the soil, NPK content of the soil, Porosity of the soil, Average rainfall, Surface temperature,
Sowing season.
|
Ensembling Model (RF, Naive Bayes and Linear SVM and Majority voting technique).
|
Using three different and independent classifiers enables the system to provide more accurate predictions.
|
Only four crops were used for train and test.
|
Accuracy: 99.91%
|
[47]
|
2018
|
This paper presents an intelligent system, called Agro-Consultant, which assists farmers in making decisions about which crop to grow.
|
Crops: 20 crops.
Features: Soil Type, Soil pH, Precipitation, Temperature, Location parameters.
|
Decision Tree (DT), KNN, RF and ANN.
|
A Map View feature, where the farmers can view the sow decisions made by his neighboring farmers using a pop-up marker on the
map.
|
Not considering other economic indicators like farm harvest prices and retail prices.
|
Accuracy: 91%
|
[48]
|
2018
|
This paper Investigated the predictive performance of different data mining classification algorithms to recommend the best crop for better yield, based on a classification of soil under different
ecological zones.
|
Crops: Not mentioned.
Features: pH, Organic Matter, K, EC, Zn, Fe, Mn, Cu and texture.
|
J48, BF tree, OneR and Naive Bayes.
|
Comparison of the performance of four classification algorithms.
|
Recommending a class of crops instead of recommending a single crop.
|
- Accuracy: 97%
- Precision: 97%
Recall: 97%
|
[49]
|
2018
|
The article proposes a system that gives the farmer a prior idea regarding the yield of a particular crop by predicting the pro- duction rate according to the location of the farmer and the past data of weather
conditions.
|
Crops: Rice. Features: Temperature, humidity, location, and rainfall.
|
Mamdani Fuzzy model and Cosine similarity (COS).
|
Relying only on location and weather parameters for prediction.
|
The study focused only on rice pro- duction and has not considered the other climatic conditions.
|
No performance metric used.
|
[50]
|
2019
|
This study proposed to design a KB solution for building an inference engine for recommending suitable crops for a farm.
|
Crops: Not mentioned.
Features: Elevation, temperature, fertilizer type, rainfall, field type, seed type and soil.
|
PART Rule Based Classifier and expert’s knowledge.
|
The model developed has the potential to increase the accuracy of KB system (PART Rule algorithm).
|
The evaluation in this study is done by un- known experts.
|
Farmers Accuracy: 82.2%
Domain experts Accuracy: 95.23% Agricultural extension Accuracy: 88.5%.
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[51]
|
2019
|
A new datamining technique was proposed to cluster crops based on their suitability compared to soil nature of
different areas.
|
Crops: 10 Crops.
Features: Soil, crop, temperature, and rainfall.
|
Data mining and Hierarchical clustering.
|
Various datasets were merged to extract crops requirements.
|
Only Ten crops in eight different locations of Coimbatore were used for prediction and evaluation.
|
- No performance metric was used.
|
[52]
|
2019
|
In this paper, a multi-class classification- based decision model has been developed to assist the farmer in selecting suitable crops using rough, fuzzy, and soft set
approaches.
|
Crops: Paddy,
groundnut, sugarcane, cumbu and ragi.
Features: 27 features.
|
Dominance-based rough, Grey relational analysis, Fuzzy proximity relation, Bijective soft set approach, Naive Bayes, SVM and J48.
|
The validation test outputs were com- pared to agricultural experts.
|
Only five crops were used. According to the study, the execution time shows that the model is relatively slow.
|
Accuracy: 98.4%
Precision: 92%
|
[53]
|
2019
|
This study proposes an intelligent agriculture platform that manages and analyses sensors data to monitor environmental factors, which provides the farmer a better understanding of crop suitability.
|
Crops: Celery, water spinach, green beans, and daikon.
Features: Temperature, humidity, illumination, atmospheric pressure, soil electrical conductivity (EC), soil moisture content, and soil salinity.
|
Moving average, autocorrelation, and 3D cluster correlation.
|
The system ensures a better understanding of the environmental factors behavior and analyses the farmer actions such as application of fertilizer or pesticide, it also takes global warming into consideration.
|
The application of the system analysis result isn’t automotive, and no artificial intelligence model was used.
|
No performance metric was used.
|
[54]
|
2019
|
This paper suggests the using of ANN and SVM for crop prediction considering the environmental parameters.
|
Crops: Not mentioned.
Features: Rainfall, minimum and maximum temperature, soil type, humidity, and soil pH value.
|
ANN and SVM.
|
An interface was designed to enable the access to necessary information for selecting the proper crop.
|
Test evaluation was done by comparing the predicted crop with the real ones, which not accurate since the actual cultivated crop isn’t necessarily the
optimal.
|
Accuracy (ANN): 86.80%
|
[55]
|
2019
|
The paper proposes a mobile application that will allow farmers to predict the region’s production for
a specific crop.
|
Crops: Not mentioned.
Features: Soil type, temperature, rainfall.
|
ARIMA method, linear regression (LR), SVR model.
|
An android application was developed to facilitate the farmers accessibility to the suggested
model.
|
The white noise for ARIMA model was chosen as a random value in the range of 0% to 10% of the crop
yield.
|
No performance metric was used.
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[56]
|
2019
|
This study applies learning vector quantization (LVQ), which is part of the ANN method, to provide recommendations from three types of plants.
|
Crops: Rice, corn, and soybeans.
Features: Altitude, rainfall, temperature, and soil pH.
|
LVQ.
|
Comparison of the evaluation metric between expert recommendation and real data.
|
Only three crops were used, rice, corn, soy- bean.
|
Accuracy: 93.54%
|
[57]
|
2019
|
This paper presents a stand-alone crop recommending device that detects soil quality and recommends a list of crops based on
a FL models.
|
Crops: 30 Crops.
Features: pH level, soil moisture, soil temperature and soil fertility.
|
FL.
|
Stand-alone device gives faster and real-time soil property reading and crop suggestion.
|
less details about the model and it’s performance.
|
No performance metric used.
|
[58]
|
2019
|
This paper presents a hybrid crop RS based on a combination of a CF technique and a case-based reasoning.
|
Crops: Not mentioned.
Features: Temperature data, rainfall, solar radiation, wind speed, evaporation, relative humidity, and evapotranspiration.
|
ANN and Case Based Reasoning (CBR).
|
The presented model has a remarkable performance and rational accuracy of pre- diction.
|
Only weather parameters were considered.
|
Precision: 90%
Recall: 93%
|
[59]
|
2019
|
This work proposes a model that can predict soil series with land type, according to which it can suggest suitable crops.
|
Crops: Not mentioned.
Features: Soil dataset and crop dataset.
|
Weighted KNN,
Gaussian Kernel based SVM, and Bagged Tree.
|
Suggesting crops based only on Class of soil series is very interesting.
|
More focus on soil classification.
|
Accuracy (SVM): 94.95%
|
[60]
|
2019
|
The article addressed the problem of selection of best suitable crop for a farm, by applying different classification algorithms.
|
Crops: 15 Crops.
Features: Soil color, pH, average rainfall, and temperature.
|
SVM, DT and Logistic regression.
|
The authors compared different models.
|
Only four parameters were considered as in- put to the model.
|
The best performance is 89.66% and was achieved using the SVM Classifier.
|
[61]
|
2019
|
This work proposes an ontology-based recommendation system for crop suitability recommendation based on region and
soil type.
|
Crops: Not mentioned.
Features: Soil characteristics, weather conditions and crop production.
|
RF.
|
The accuracy of the developed system is reasonably high.
|
Only 4 parameters were considered from CR.
|
Precision: 65%
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[62]
|
2019
|
This paper proposed a hybrid RS based on two classification algorithms by considering various attributes.
|
Crops: 24 Crops.
Features: 15 Features.
|
Naive Bayes, J48.
|
The model was evaluated using Multiple Performance metrics.
|
Farmers cannot lo- cate their exact coordinates.
|
Accuracy(J48): 95%
Recall(J48): 96%
F-Measure(J48): 86%
|
[63]
|
2020
|
The article proposes using hierarchical fuzzy model to reduce the classical system complexity with the huge number of generated rules.
|
Crops: Not mentioned.
Features: Sand, silt, clay, nitrogen, phosphorus, potassium, soil color, soil pH, soil electrical conductivity, rainfall, climate zone, and
water resources.
|
Hierarchical fuzzy model.
|
The number of generated rules were reduced from 439 to only 152.
|
No evaluation metrics are provided.
|
No performance metric was used.
|
[64]
|
2020
|
This work developed an application that helps selecting the most convenient type of crops in a certain zone considering the climate conditions of that zone, the production, and the needed resources for each
crop.
|
Crops: Peach, Pear, Apricot, and Almond. Features: Relative humidity, Radiation, Wind speed, Temperature, Wind direction, Cooling units, Sunlight, Rainfall, Accumulated radiation, and Wind run.
|
Fuzzy system.
|
The farmer’s recommendation request is made using internet of things (IoT) devices.
|
The recommendation module can be scaled to consider other types of additional information like soil parameters.
|
No performance metric used.
|
[65]
|
2020
|
This paper proposed an implementation of a fuzzy-based rough set approach to help farmers in deciding on CS in their agriculture land.
|
Crops: 23 Crops.
Features: 16 Features.
|
FL.
|
The performance is measured using different evaluation metrics.
|
The suggested method can be tested with a wide set of new crops.
|
Accuracy: 92%
Precision: 93%
Recall: 92%
F-Measure: 91%
|
[66]
|
2020
|
This work proposes a CR system according to multiple properties of the crop and land.
|
Crops: 24 crops.
Features: Soil types, pH, Electric Conductivity, Organic
Carbon, Nitrogen, Phosphorus, Sulfur, Zinc, Boron, Iron, Manganese, and
Copper.
|
Property matching.
|
Fast and simple algorithm.
|
Only soil properties were considered as in- put to the model.
|
PCS: 4.80%
COS: 6.45%
|
Reference
|
Year
|
Contribution
|
Dataset
|
Models &techniques
|
Strengths
|
Weaknesses
|
Performance
|
[67]
|
2020
|
This paper proposes a CS method to maximize crop yield based on weather and soil parameters.
|
Crops: 10 Crops.
Features: Soil type, soil nutrients, soil pH value, Drainage capacity, weather conditions.
|
RF.
|
The soil and predicted weather parameters are used collectively to choose suitable crops for land.
|
Only Four soil parameters were considered.
|
No performance metric was used.
|
[68]
|
2020
|
This study Proposes a clustering center optimized algorithm by SMOTE, then use an ensemble of RF and weighed SVM to predict the recommended
crop.
|
The data includes 1530 soil samples and
13 types of cultivated land crops.
|
RF and SVM.
|
Classification of crops based on soil analysis
|
The study reference range is limited.
|
Accuracy: 98.7%
Precision: 97.4%
F1-Score: 97.8%.
|
[69]
|
2020
|
This paper treats the integration of AHP and POPSIS with GIS to determine most suitable crops for parcels for land consolidation areas.
|
Crops: Corn, Clover, Sugar beet and Wheat.
Features: 63 Land Map Units and their chemical, physical, topographical, and socio-economic
features.
|
Analytic Hierarchy Process (AHP), Technique for Order Preference by Similarity to Ideal Solution (TOPSIS).
|
The integration of AHP, TOPSIS and
GIS functions provide an effective platform to determine the suitability.
|
Several criteria can be added such as meteorological and irrigation.
|
No performance was used.
|
[70]
|
2020
|
This article proposes a system for predicting the crop which has maximum yield per unit area in a district.
|
A dataset published by the Government of Maharashtra, India, containing approximately 246 100 data points.
Features: 7 parameters for the years 1997 to 2014.
|
RF.
|
The algorithm works even when the variables are mostly categorical.
|
The performance could be much better when considering more variables.
|
Normalized Root Mean Squared Error (NRMSE): 49%
(median value).
|
[71]
|
2020
|
This article suggests a FL-based CR system to assist farmers in selecting suitable crops.
|
Crops: Paddy, Jute, Potato, Tobacco,
Wheat, Sesamum, Mustard and Green gram.
Features: 11 soil parameters, elevation
And rainfall.
|
FL.
|
the validation was based on a Cultivation Index (CI).
|
No explanation of how the member- ship functions of the inputs and outputs were derived from the dataset.
|
Accuracy: 92.14%
|
3. How did research about CR evolved over time?
The uprise of new technologies in solving agricultural problems is an immi- nent fact. We can see in Figure 3a the distribution of selected studies over time, and we notice a remarkable increase in the number of studies related to CR from 2014 to 2020. Figure 3b illustrates the distribution of papers based on source database. Most of the selected papers were published in IEEE or Springer, and fewer papers were found in Wiley database. Furthermore, Figure 3c presents the proportion of each type of publication, nearly 62% of papers were published in imminent conferences, and about 31% comes from journal issues, while only 7.14% are book chapters, which enforce the quality of the publications included in the SLR.
4. What are the main techniques that were used in the literature for CR?
RSs are generally classified into three types: CBF, CF, or HF. The CF based model was used the most in our literature review, as discussed in section 2, CF tries to compress the entire database into a model, then performs its recommendation task by applying reference mechanism into this model. We identify two common approaches for MBCF: clustering and classification. Clustering CF assumes that users of the same group have the same interest, so they are partitioned into groups called “clusters”. The authors in [61] proposed K-means clustering (KMC) algorithm, which is an unsupervised learning algorithm used to find out fertilizers with NPK contents that are the nearest to the requirements of a specific crop. First it calculates the required amount of the fertilizer, then the algorithm forms clusters of nearby fertilizers based on the Euclidean distance. Therefore, fertilizers in clusters with minimum distance are recommended to farmer. The recommendation task can be viewed as a multiclass classification problem, which uses a classifier supervised learning algorithm that maps the input data to a specific output, a variety of these classifiers were tested on agricultural data. In this context, this study [48] carried out a comparative experiment on data instances from Kasur district, Pakistan for soil classification using J48, BF Tree and OneR, that are a variety of DT based models, which is the most used technique in our literature survey. Besides Naive Bayes Classifier (NBC) that has a significant outcome, mainly because it encodes dependencies among different features by which it connects the causal relationships between items. On the other hand, [54] has investigated the use of SVM and ANNs. The results indicate that the ANN model captures non-linearities among features of the dataset, marking the best accuracy and prediction rate compared to SVM. Another technique for CS that can improve the model’s accuracy is ensemble learning, [46] exploited this technique in order to build a model that combines the predictions of multiple ML algorithms together and recommend the right crop with a high accuracy. The independent base learners used in the ensemble model are RF, NBC, and Lin- ear SVM. Each classifier provides its own set of class labels with an acceptable accuracy. The labels class of individual base learners are combined using the majority voting technique. The CR system classifies the input soil dataset into the recommendable crop type, Kharif and Rabi (Autumn and Spring). One of the most promising models in CF is FL, which extracts IF-THEN rules from the provided data using a membership function and linguistic variables that expresses the human knowledge. The authors in [49] proposed a fuzzy based model that uses 27 rules with 3 modalities: Low, Medium, and High. In this traditional single-layer fuzzy system, the rules are exponentially increased when the system’s parameter increased, and a larger rule base will affect the system performance and transparency. Therefore, [63] developed a multi-layer system by using the fuzzy hierarchical approach. The hierarchical fuzzy model was ap- plied in the same Mamdani[1] fuzzy inference system for a suitable CR system. The CR system has 12 input variables, and it was decomposed into six fuzzy subsystems, then arranged by priority. The results show that a hierarchical CR system provides a better performance than a traditional fuzzy CR.
CFMB models have a low frequency in the reviewed studies, even though they represent the most common approach in RSs. Yield prediction is based on similarity relationships among items (farms or crops), in terms of collected production yield. For instance, [39] proposed a model consisting of calculating the PCS between farms using the information stored in the crop growing period database, the thermal zone database, and the physiographic database, then, they select the top-n similar farms. The seasonal information and CPR of each crop of the similar farms are used for filtering the first appropriate list to the context. Finally, they recommend the top-k crops to each user respectively. Another study [49] used the similarity approach, in which they developed a system that gives the farmer a prior idea regarding the yield of a particular crop, by predicting the production rate. The COS measure is used to find the similar farmers in terms of location from the database. Then, the resulting farmers that are like the querying farmer form the database for the fuzzy algorithm.
CBF, one of the most significant models in RSs, which are of a high importance for CR, as well as yield estimation. Examples of CBF model applications included in [66], the model of this study is based on the contents that use soil and crop properties, then suggests the list of five high priority crops based on the corresponding properties between the crop and the land for matching soil properties. The algorithm takes two inputs, the land soil details and the re- quired property value for each crop. Primitively, the algorithm computes the similarity between the land and the crop, based on their properties to predefine a range. If the comparison falls in a predefined range, they generate a rank for the combination of crops and lands. In another study [51], authors developed a new data mining technique to cluster the crop based on the suitability of a crop against the soil nature of areas. Features are extracted from the datasets using five different feature extraction metrics, such as, pH distance calculation, NPK (Macro nutrition distance calculation), MICRONUT (Micro nutrition distance calculation), water requirements, and temperature requirements. Then, the crops are clustered using hierarchical clustering based on the vectors into three groups, namely: most suitable, less suitable, or least suitable.
HF is another significant category of models used in CR. In the first study [64], authors presented a new method, which is integrated within an IOT system, that is developed to advise farmers which crop type will generate more yield. A fuzzy clustering technique is proposed to the obtained groups that have been characterized by their weather conditions. The extracted knowledge forms the model and the rules engine. Finally, the RS generates an ordered list of crops that are suitable in descending order. In the second study [37], the authors developed a CR (hybrid) system, which utilizes FL to choose from 44 crop rules. The system is based on FL, which gets input from an ANNs based weather prediction module. An agricultural named entity recognition module is developed using conditional random field to extract crop conditions data. Further, cost prediction is established based on a LR equation to aid in ranking the crops recommended.
Table 3 shows how many studies describe an approach in each of the classes described earlier in section 2, as well as the studies themselves according to the approach category. As an outcome, a significant number of CF approaches when developing RSs are observed. Over half of the reviewed studies indicated that CF is the most used approach, with a stronger emphasis on a MB method. Perhaps, the availability of historical datasets of farmers linked to the marked dominance of CF in the last years.
Figure 4 traces the timeline of publications, this latter confirms that CF with a MB method has a continual growth. The graph shows that there has been a slight increase in the two recent years in this field and the number of studies is likely to increase after 2020.
Another important conclusion drawn from Table 3 and Figure 4 is the scarcity of research efforts focused on other filtering methods. Nevertheless, some studies showed that the CBF and HF give more accurate recommendations overall than all other types of filtering. However, throughout the years, the research pace on these types of filtering has been relatively low.
Classification of RS
|
Number of studies
|
References
|
CF / Model-based
|
28
|
[32, 33, 34, 38, 40, 41, 42,
|
|
|
46, 52, 61, 53, 54, 55, 56,
|
|
|
59, 47, 63, 57, 69, 62, 68,
|
|
|
71, 43, 65, 60, 70, 67, 45]
|
CBF
|
5
|
[35, 48, 51, 66, 44]
|
HF
|
5
|
[36, 37, 50, 58, 64]
|
CF/MB
|
2
|
[39, 49]
|
Table 3: Articles by type of recommendation technique.
[1] First introduced as a method to create a control system by synthesizing a set of linguistic control rules obtained from experienced human operators. In a Mamdani system, the output of each rule is a fuzzy set.
Table 5 shows the distribution of applied ML algorithms in this study. Some papers applied more than one ML algorithm. Peculiarly, the most applied ML algorithm is DT-based. However, this SLR does not differentiate between different DT-based algorithms (J48, Part, RF, etc...) in the analysis. The other widely used algorithms are SVM and FL algorithms. Some ML algorithms had a low rank in this SLR despite their popularity. It is the case of the similarity methods or regression algorithms. Thus, these algorithms are not being investigated enough, which opens opportunities for future studies in CR field, to fill this gap.
ML Algorithm
|
Number of studies
|
References
|
DT
|
13
|
[38, 52, 48, 47, 46, 61, 59, 50, 68,
|
|
|
60, 70, 67, 45]
|
FL
|
11
|
[32, 37, 41, 52, 63, 49, 57, 64, 71,
|
|
|
43, 65]
|
SVM
|
8
|
[46, 52, 54, 55, 59, 68, 60, 45]
|
ANN
|
6
|
[33, 37, 54, 59, 58, 45]
|
NBC
|
4
|
[38, 46, 52, 48]
|
Regression
|
4
|
[40, 60, 45, 69]
|
KNN
|
3
|
[38, 59, 47]
|
Ensemble
|
2
|
[38, 46]
|
KMC
|
1
|
[61]
|
PCS
|
1
|
[39]
|
COS
|
1
|
[49]
|
LVQ
|
1
|
[56]
|
Table 5: Number of articles by type of ML algorithm used.
5. What are the main input features?
ML models are data-depending models, without a constitution of high- quality training data, even the most performant algorithms theoretically will not give the expected results. Indeed, robust ML models can be useless when they are trained on inadequate, inaccurate, or irrelevant data. In the same con- text a wide variety of inputs were suggested in the reviewed articles, Figure 5 shows the classification of these parameters in six categories, viz.:
Geography: This category of inputs indicates the agroclimatic regions, which is a land unit suitable for a certain range of crops and cultivars. Table 7 shows that 19 papers built their RS using geographic data among other variables which confirms the importance of this type of inputs, mainly because it works as an identifier that is unique to every farm.
Weather conditions (WCs): Weather plays a major role in determining the success of agricultural pursuits. For farmers, timing is critical in the obtainment of resources, such as: fertilizer and seed, but also forecasting likely weather in the upcoming season, informing on how much irrigation is needed, as well as temperature that can affect crop growth. These factors can be determined by recording hourly, daily, or weekly, temperature, rainfall, solar radiation, wind speed, evaporation, relative humidity, and evapotranspiration. In this SLR, WCs were used in 75% of the reviewed articles as Table 7 indicate.
Soil propriety (SP): All soils contain mineral particles, organic matter, water, and air. The combinations of these components determines the soil’s quality, which depends both on its physical properties (texture, color, type, porosity, bulk density, etc.) and chemical properties (soil pH, soil salinity, nutrients availability, soil electrical conductivity etc.). Table 7 confirms that soil characteristics are the mandatory inputs on which researchers-built crop RSs.
Soil physical properties:
- Soil texture: Refers to the size of the particles that make up the soil and depends on the proportion of sand, silt and clay-sized particles and organic matter in the soil, it can influence whether soils are free draining, whether they hold water and how easy it is for plant roots to grow.
- Soil color: The surface soil varies from almost white through shades of brown and grey to black. Light color indicates law organic matter content while clave color indicates a high organic matter content.
- Soil type: It describes the way the sand, silt and clay particles are clumped together. Organic matter (decaying plants and animals) and soil organisms like earthworms and bacteria influence soil structure. it is important for plant growth, regulating the movement of air and water, influencing root development and affecting nutrient availability.
- Soil porosity: It refers to the pores within the soil. Porosity influences the movement of air and water. Healthy soils have many pores between and within the aggregates. Poor quality soils have few visible pores, cracks, or holes.
- Bulk density: It is the proportion of the weight of a soil relative to its volume. Bulk density is an indicator of the amount of pore space available within individual soil horizons and it reflects the soil’s ability to function for structural support, water and solute movement, and soil aeration.
Soil chemical properties:
- Soil pH: Soil reactivity is expressed in terms of pH and is a measure of the acidity or alkalinity of the soil. More precisely, it is a measure of hydrogen ion concentration in an aqueous solution and ranges in soils from 3.5 (very acid) to 9.5 (very alkaline). The effect of pH is to remove from the soil or to make available certain ions.
- Soil salinity: It is the salt content in the soil; the process of increasing the salt content is known as salinization. Salts occur naturally within soils and water. Salination can be caused by natural processes such as mineral weathering or by the gradual withdrawal of an ocean.
- Nutrients availability: Sixteen nutrients are essential for plant growth and living organisms in the soil. These fall in two different categories namely macro- and micronutrients. The macronutrients include Carbon (C), Oxygen (O), Hydrogen (H), Nitrogen (N), Phosphorus (P), Potassium (K), Calcium (Ca), Magnesium (Mg), Sulphur (S) and are the most essential nutrients to plant development whereby a high quantity of these is needed. The micronutrients on the other hand are needed in smaller amounts, however they are still crucial for plant development and growth, these include Iron (Fe), Zinc (Zn), Manganese (Mn), Boron (B), Copper (Cu), Molybdenum (Mo) and Chlorine (Cl). Nearly all plant nutrients are taken up in ionic forms from the soil solution as cations or as anions.
- Soil Electrical Conductivity (SEC): It is an indirect measurement that correlates very well with several soil physical and chemical properties. Electrical conductivity is the ability of a material to conduct (transmit) an electrical current. As measuring soil electrical conductivity is easier, less expensive, and faster than other soil properties measurements, it can be used as a good tool for obtaining useful information about soil.
- Crop propriety (CP): Some crops are very labor-intensive. Some crops require more skill than others. Some crops are riskier than others (high profit if it’s a good year but high chance of crop failure if the weather is bad), and some farmers are more able to cope with those risks. Each crop has its suitable amount needed of nutrients, optimal weather conditions and optimal soil properties. Unfortunately, there is no universal structure or data source for this kind of crop information, so researchers in different reviewed papers uses data mining techniques to extract knowledge from raw data, where FL shows high quality results, because of its rules generating model.
- CPR: There are a lot of crop types produced in farms not all of them are suitable for producing in all areas. So, considering CPR of each one of them for every farm is very important to recommend and predict the crop productivity. Almost 90% of the reviewed papers are using supervised learning, where crop yield or crop profitability, in ton/hectare or kg/hectare, were used as the dependent variable.
- Market: Even with a high yield, decision about recommending the crop cannot be taken without knowing its price for the period of sell, as well as what is its cost. The price of a specific crop is determined through demand/supply in the market; however, it can be predicted using historical data. In the other hand, cost can only be given by the farmer himself, ultimately it remains difficult to gather such data. Table 7 confirms this claim with just four papers using market information.
Feature class
|
Number of studies
|
References
|
WCs
|
20
|
[32, 33, 35, 36, 37, 40, 41, 42, 39, 50,
|
|
|
54, 57, 58, 64, 71, 43, 60, 67, 44, 45]
|
SP
|
19
|
[32, 35, 36, 37, 38, 40, 41, 42, 39,
|
|
|
50, 57, 69, 68, 71, 43, 65, 66, 60, 45]
|
Geography
|
17
|
[35, 33, 36, 37, 38, 40, 41, 42, 39,
|
|
|
50, 54, 69, 71, 65, 70, 67, 45]
|
CPR
|
9
|
[32, 35, 40, 39, 50, 64, 70, 44, 45]
|
CP
|
9
|
[35, 39, 50, 54, 57, 69, 64, 67, 44]
|
Market
|
4
|
[32, 35, 37, 40]
|
Table 7: Distribution of papers by feature classes.
Table 9 presents the number of papers for each variable; it indicates that from all the features cited above temperature and rainfall are the widely used parameters. This finding is coherent with the fact that WCs have an important impact on the CPR and determine the soil’s sustainability, nevertheless, it remains necessary to extract other information to build an efficient CR. This information was grouped previously in the soil property category and they are pH-value, soil type and nutrient availability where they were cited respectively in [15], [11] and [10]. Less important variables occurred in a range of 1 to 6 and they are a mix of all categories of features such as Elevation for geography, salinity for soil characteristics and humidity in weather conditions. The number of articles included in this SLR, could give a relevant order of variable importance evolved in a CR algorithm which is statistically supported by the limited number of research papers in precision agriculture dedicated to CR.
Feature
|
Number of studies
|
References
|
Temperature
|
17
|
[46, 52, 61, 49, 54, 55, 51, 56, 47,
|
|
|
57, 58, 64, 43, 60, 70, 67, 45]
|
Rainfall
|
16
|
[46, 61, 52, 49, 54, 55, 51, 56, 47,
|
|
|
63, 64, 71, 60, 70, 67, 45]
|
pH-value
|
14
|
[46, 48, 51, 56, 59, 47, 63, 57, 69,
|
|
|
43, 65, 66, 60, 67]
|
Soil type
|
10
|
[46, 52, 48, 61, 54, 51, 47, 65, 67, 45]
|
Nutrients
|
9
|
[46, 61, 51, 59, 68, 43, 65, 66, 45]
|
Humidity
|
6
|
[49, 53, 58, 64, 43, 45]
|
Yield rate
|
5
|
[61, 49, 55, 64, 70]
|
EC soil
|
5
|
[48, 53, 63, 69, 66]
|
Salinity
|
5
|
[53, 59, 69, 43, 65]
|
Crop type
|
4
|
[54, 34, 35, 50]
|
Pressure
|
2
|
[53, 58]
|
Soil color
|
2
|
[63, 60]
|
Elevation
|
2
|
[71, 45]
|
Soil porosity
|
1
|
[46]
|
Table 9: Distribution of papers by features.
6. Which evaluation parameters and evaluation approaches have been used?
Several evaluation metrics have been used. Table 11 gives information about metrics used for evaluation techniques in the reviewed studies. This SLR restricted to the CS, which make classification metrics such as: Accuracy, Precision or Recall, the most popular performance metrics used in the studies of this SLR.
Accuracy is the proportion of true results among the total number of cases examined, which has the highest number of occurrences in our SLR, more than precision, which refers to the fraction of relevant recommendation among the retrieved crops, and recall, which refers to the fraction of retrieved recommendation among all relevant crops.
Another important result is the existing of studies evaluating their results by using regression error metrics such us: RMSE, MSE and the mean absolute error (MAE) metrics. The reason is that a study may use CPR as an output of the model developed, then, it chooses the crop which has the highest rate.
What is striking in the Table 11 is the remarkable amount of papers that are not evaluated by any performance criteria. The most likely cause of this result is the difficulty to verify whether the recommended crop is truly the correct one. For this reason, most of studies seek the help of experts or farmers to judge the relevance of the suggested crops [50].
Metric
|
Number of studies
|
References
|
Accuracy
|
19
|
[46, 52, 48, 54, 56, 59, 47, 35, 36, 38,
|
|
|
42, 50, 71, 65, 60, 45]
|
Precision
|
10
|
[52, 48, 36, 37, 39, 58, 68, 65]
|
Recall
|
7
|
[48, 37, 39, 52, 58, 68, 65]
|
F-measure
|
5
|
[37, 52, 68, 65]
|
Sensitivity
|
2
|
[52, 36]
|
Specificity
|
2
|
[52, 36]
|
MSE
|
2
|
[33, 45]
|
RMSE
|
2
|
[48, 70]
|
MAE
|
1
|
[48]
|
No metric
|
11
|
[61, 49, 53, 55, 51, 63, 32, 34, 40, 41,
|
|
|
57, 64, 43, 67]
|
Table 11: Distribution of articles by performance metrics.
7. What are the current challenges in CR?
This section puts into terms challenges encountered in the extant research, these challenges are observed through three layers: the proposed algorithm, the used data and the evaluation preferences. Most studies have almost exclusively focused on the exploitation of CF algorithms for classification and clustering, more precisely: DTs, SVM, ANNs and NBC. Several gaps and shortcomings were identified in these techniques, namely the cold start problem, where the system cannot draw any inferences for users or items about which has not yet gathered sufficient information. A closer look to the literature, reveals that these proposed adaptations are very classical, since the field of RS has known a significant improvement due to entertainment companies, new algorithms were developed. For Instance, the Netflix Prize was an open competition for the best CF algorithm to predict user ratings for movies [72]. On September 21, 2009, the grand prize was given to the BellKor’s Pragmatic Chaos team, which bested Netflix’s own algorithm for predicting ratings by 10.06% [73]. During this com- petition MF became widely known due to its effectiveness, and important steps were taken in later years towards some very successful algorithms, which share the same basis of latent factor and user/item representations. Unfortunately, there was no article that suggested an implementation of these modern techniques for CR. A potential barrier that may face researchers in their quest to solve this delay is the available data, Figure 6 shows that more than 50% of the reviewed papers were published by Asian researchers, and more precisely Indian researchers, where SP parameters, CPR parameters and parameters for a very long period of time and by different states/districts, are available in government official websites. Notwithstanding, these data are inaccessible for foreign scientists. Furthermore, the proposed structure of data presents another challenge, where the vast majority of the well performed RS are fitted by a user/item rating matrix, and the user’s demographic data for hybrid systems, where ratings are either explicit or implicit (number of clicks, number of page visits, the number of times a song was played, etc.) So different complex preprocessing techniques are required. Even though, the sparsity remains a potential challenge to deal with. Finally, the high correlation degree that was observed for performance metrics and evaluations of the model, based on anonymous experts raises a very important question about the reliability of these results, hence the proposed algorithm.