Material and methods
Study area
The city of Hamadan is located between latitude 2359’ and 25 45’ north and longitude 47 34’ and 49 36’ east of the Greenwich Meridian. Hamadan is located in the western part of Iran on the slopes of Alvand mountain range and it has a mountainous weather and has many ancient, historical, natural, cultural, recreational and sportive items for tourist attractions.
Hamadan is the first Persian capital that has been mentioned by Herodotus (the famous Greek historian) (Jahanpour, 2018). The Ganj Nameh inscriptions (500 BC), the stone lion statue related to of the Medes (the first dynasty of Iran; 728 to 549 BC), the holy tomb of Esther and Mordechai (for the Jews) mentioned in the Torah of the Old Testament, the tomb of Avicenna (the well-known physician and philosopher (about 1000 years ago)), the tomb of Baba Tahir famous poet (about 1000 years ago), the Lalehjin’s Handicrafts city (which was registered as the World Pottery City in UNESCO in 2017), Ali-Sadr Cave, the unique largest water cave in the world, the beautiful Alvand mountain range with the highest peak with an altitude of 3574 meters above sea level and very beautiful valleys, suitable weather, ski resort as well as large sports and recreational complexes are among the tourist areas of Hamadan city (Fatemi, 2017). These particular conditions have made Hamadan as a place with a potential for attracting tourists.
Research method
In this study, both documentary sources and survey instruments were utilized in the data collection process. In the former, the factors were extracted and were arranged in a questionnaire. The content validity index (CVR) was used to assess the validity of the questionnaire based on the opinions of the 15 experts (CVR = 0.65>0.49) (Lawshe, 1975). Reliability of the questionnaire was evaluated using the Cronbach’s alpha which was 0.72. Table 1 shows the effective variables and factors on tourism satisfaction (TS) in Hamadan city. Finally, 300 tourists were selected randomly.
Statistical Analysis
This study utilized two widely used non-parametric methods of random forest and K-nearest neighbor to capture any nonlinear relationship between inputs and outputs.
Random Forest
Random forest (RF) as an ensemble learning method works based on creating several (say 1000) regression trees (Grömping, 2009). In each tree the predictor with smaller prediction error is selected to be at the top of the tree and it is split into two parts (for continuous variable a cutoff point is created using minimizing prediction error). This partitioning continues recursively to create a tree. Prediction is then done using averaging the response variable in each leaves of the final tree (in regression setting). To improve the prediction performance of the tree regression methodology, the random forest technique applies random selection in two ways. In this regard, each tree is formed by using a random sample that is selected from all original observations (bootstrapped sampling), and a random sample of candidate variables (inputs) for splitting,(Breiman, 2001; Grömping, 2009). The issue of instability of these trees is handled by these randomness because it leads to introducing differences in individual predictions that are obtained from each tree (Barnett, Dümenil, Schlese, Roeckner, & Latif, 1989). In order to obtain predictions for the final forest, the averaging rule is used for the results of all individual trees (Barnett et al., 1989; Breiman, 2001).
K-nearest neighborhood
The K-nearest neighborhood (K-NN) method is one of the most popular non-parametric regression methods. In this method, the distribution function of predictive values is obtained by using a nonparametric distribution of a kernel function. This model predicts future observations based on the similar situation at present, i.e., the probable conditions in the future will be the same as those that occurred at the present time. The probability of occurrence of each state in the current situation depends on the similarity of the observed vector of the independent variables at present and the observed independent vector in the historical series (Karamuz & Araghinejad, 2014).
Implementation
To implement the models, variables in Table 1 were used as predictors and TS was used as the output. Then, both RF and K-NN techniques were implemented to identify important variables that affect TS. To provide some goodness of fit measures, we divided the data set into two sets of training and testing (80–20%). The two models were trained using the data in the training set and were tested on the testing set. Evaluation criteria used to investigate the performance of the methods included root mean square error (RMSE), Criterion-referenced measurement (CRM), Nash-Sutcliffe efficiency (NSE) (Lindström, 2016), Pearson correlation coefficient (R) and Morgan-Granger-Newbold (MGN) statistic (Ghorbani & Afgheh, 2017).
Results and Discussion
Table 2, shows the demographic characteristics of the individual participated in the study. According to the table, the majority of the participants was male (65.4%), aged between 20 and 45 (54.3%), single (67%) and had academic education (78.4%).
Both RF and KNN models were optimized and the evaluation criteria were computed over the testing set. Table 3, shows the performance criteria of RF and K-NN. Comparing the results of the two models showed that although RF model produced smaller error rate (RMSE = 0.34) and greater correlation between observed and predicted responses (R = 0.90) compared with the K-NN model, but the K-NN model had smaller biases (CRM = 0.0005) (see Fig S1) and skewness of the errors (0.32) (see Fig S2). In fact, the precision of the RF model and the bias rate of the K-NN model were better. Accordingly, in order to test the significance difference between the accuracy of the results of the two models, the MGN statistic was used. The value of this statistic was not significant (P-value = 0.509) indicating that there was no statistical differences between the accuracy of the models.
Fig S3 shows the variable importance (VIMP) for the predictors of satisfaction. As can be seen, society behavior and municipal equipment were the first two top rank variables and cost of services was the least important variable in predicting tourist satisfaction.
Discussion and conclusion
The tourism industry brings about diversity in economic activities and employment, and realizes the distribution of revenues to a wider range of individuals and groups in societies. In the context of the changes brought about by globalization, only those cities and regions with strategic and futuristic programs have the potential to make optimal use of the advantages of this industry. These cities are mostly creative cities.
The results of this research showed that the behavior of the host community index including the variables such as the honesty of the host community, hospitality spirit, citizens’ attitude towards tourists had the highest effect on the satisfaction of tourists. The way citizens treat tourists reflects the host society’s understanding of tourists in the cultural, social, and economical exchanges. This in turn leads to recognizing the attractions and the financial benefits of tourism as well as to enhancing the conditions for peace and sustainable security among communities. Therefore, the deeper interaction between the host and guest communities in the tourism sector leads to more satisfaction and consequently it results in a thriving in the tourism industry.
The second top most rank variable affecting tourism was municipal facilities and equipment including items such as access to housing centers, access to fuel cell, drinking water, ATMs, urban traffic, etc. This index had a significant impact on tourism with an importance of 0.81. Municipal facilities and equipment are a factor indicating the convenience and comfort as well as the utility of tourist spaces. These factors reduce the hardship of travel and the fatigue caused by displacement and brings pleasure and tranquility to tourists. The evaluation of the outputs of the model indicates the favorable situation that tourists have expressed about the types of urban equipment in Hamadan.
The quality of environment with the importance of 0.488 was the third top most important factor affecting the tourism process. The main elements of this construct are climatic comfort, environmental relaxation, visual quality and urban landscape. In this construct, special attention is paid to the elements of native-oriented tourism. Environmental quality is the underlying factor of paying attention to the protection of natural and human-made environmental identities. The urban landscape reveals the totality of a city as a text, allowing the reader to interpret this text by the interpreter (the tourist). The city of Hamadan has an appropriate environmental quality from the perspective of tourists due to the design and morphology of the city as well as its establishment in the beautiful slopes of Mount Alvand which are among the factors attracting tourists to this city. Other factors involving in the tourism process were the quality of services with coefficient 0.314 and cost of services with coefficient 0.167 that have had less impact on tourism.
Considering the ability and capacity of the ancient city of Hamadan for tourism (which has led to selecting this city as Asian urban tourism pilot and capital in 2018), the need for paying more attention to policies and programs by all policy-makers working in the field of tourism is clear. The consequence of such a broad participation is the continued prosperity of the tourism industry.