This study was an applied and descriptive method and performed in two steps as follows:
Predictor Features
Socio-demographic variables
this category consists of age, gender (male and female), literacy level, marital status, occupation type, source of income, monthly income, and insurance situation.
Chronic diseases
This class of variables is divided into cancer, cardiovascular accident (CVA), diabetes, high blood pressure, liver, renal, eye, bone, muscle, and other functional diseases, depression, and convalescences.
Behavioral and psychosocial factors
By considering the factors existed in this research, a SA person has a pleasure level of personal independence (the grade ranging from 90 to 100 according to the Barthel index), best life satisfaction (the grade ranging from 20 to 35 from the Diener Life Satisfaction Scale which was ranged from 5 to 35) with a pleasant quality of life (the grade higher than 70 in the SF36 questionnaire which was ranged from the 0 to 100). The determinant of these factors can be described below
Quality of life: This factor can evaluate the quality of life and health. Ware and Sherbourne designed it in 1992. It included 36 questionnaires and eight contexts: physical and social ability, physical and cognitive active participation, psychological health, assessment vivacity, physical pain, and general health status. Moreover, 36-SF consists of two general measurements of physical, mental, and social health known as the total physical and mental component score. The subject in these contexts is ranged from 0 to 100. The higher amounts mean that the quality of life for the elderly is better. The reliability and validity of this questionnaire have been investigated and justified among the Iranian samples(31, 32).
Individual independence
independence in people is calculated through the Barthel Index. It poses ten questions and computes the level of physical health in people. This variable specifies each person's ability in various domains of daily functions on one scale ranging from 0 to 100. The 0 to 20 pertained to very dependent people, 20 to 60 were entirely dependent, 61 to 90 were medium dependent, 99 to 91 were little dependent, and the grade of 100 was associated with the fully independent person(33).
Satisfaction with Life scale (SWLS)
This criterion was introduced by Diener et al. It included five items assessing the cognitive parts of health and well-being. Each condition consists of seven options and ranges from 1 to 7, from highly disagree to agree strongly, in which a higher score shows better life satisfaction. The validity of this questionnaire was justified by Bayani et al. in 2007(34).
Lifestyle
determining the lifestyle depends on the whole grade obtained. It can be obtained by specifying a grade ranging from 42 to 98 associated with the unpleasant lifestyle, 99 to 155 for medium lifestyle, and 156 to 211 pertained to a pleasant lifestyle. It evaluates exercise, healthy eating, stress management, social and interpersonal relationships, physical activity, and recreation(35).
2. Data analysis: After gaining the essential factors affecting the SA and preparing our dataset, we performed the data analysis. This step includes preprocessing the dataset, selecting and implementing the ML models, and assessing to get the best predictive model affecting the SA. In preprocessing phase, first, we resolved inconsistencies in our database when integrating the different databases in various elderly centers. Second, we cleaned our data. In this respect, we identified and smoothed any noises in our databases by determining the quantiles of our data point distributions. For the data points that had the missing values in the output class, we removed the records having that missing class. Also, to fill the missing values in the features predicting the SA, we imputed the missing values through regression methods to predict the missing values by other existing values in attributes. The trimmed mean was used for numerical data to replace the missing numerical data with the most negligible bias. Third, because of inequity in the number of samples pertained to output classes, we used the Synthetic Oversampling Technique to balance the number and to prevent form biases when evaluating the algorithms' performance. Forth, to better analyze the dataset and reduce the data dimensions, we used the Feature Selection (FS) method. FS is selecting the best variables and decreasing the dataset dimensions. This process has potential benefits such as removing irrelevant data, precluding form overfitting of the algorithm, increasing the ML training speed, reducing the data redundancy, and higher learning accuracy(36–38). In this study, we used the independence test of Chi-square (χ2) to determine the relationship between each factor affecting the SA prediction and SA. The variables with the relationship at P < 0.05 were considered statistically meaningful in this regard. Other factors that didn't meet this statistical level were removed from the study.
To perform the ML process to get the most common model determining the success among adults based on the best factors affecting the success obtained from statistical analysis, we used the Weka V 3.9 software. Seven ML algorithms were selected and implemented in this respect because the most commonly used with higher performance than other algorithms in recent studies.
• RF: the RF decision tree is a well-known machine learning algorithm for creating decision models. This algorithm consists of many subtrees for classifying the dataset samples. The RF algorithm trees are built using the Classification and Regression (CART) technique with no pruning. In this algorithm, the splitting process occurs through the random selection of the variables. This feature gave the RF a high capability for sample classification, especially in large dataset dimensions. This algorithm is suitable for the high-dimensional dataset with many variables and uses the voting process to classify the samples with high accuracy. In other words, this algorithm performance is shared in its most sub-trees with the increased classification capability. In large datasets, this algorithm, through distributing the samples in subtrees with different dataset attributes, can pose a high accuracy even with noisy data and prevent the overfitting of the algorithm. Generally, the RF advantages can be noted as fast and accurate in the training process with resistance to noisy data as its flexibility(39–42).
• Ada-boost: the ada-boost or adaptive-boost in one type of boosting algorithm from ensemble category using the weak classifiers to classify the cases simultaneously and identify and remove the errors in the classified cases in each classifier sequentially. Some advantages of this algorithm can be enumerated as the generalizability with high accuracy, efficient calculation ability, flexibility for various tasks with sophisticated data, easily adaptable, and the capability to integrate with other algorithms(
43,
44).
• J-48: this algorithm is the newer model but similar to the ID3 algorithm. Its generic name is C4.5, but in Weka software is named J-48. This algorithm uses the info gain to split the tree using the variable with the highest entropy. Therefore, this algorithm's splitting process is not random, contrary to the RF. The tree is built based on features capability in classifying samples with the highest entropy recognized during the training process. This algorithm has more pruning features than others, which allows the J-48 to have less capability to be overfitted than other ML models. Generally, some beneficial algorithm features are: using the confidence factor to set the tree size and preclude from overfitting process, the capability of working with continuous variables contrary to other decision tree algorithms, extracting rules with the maximum discrimination between the output classes, and the ability to work with missing values(45–47).
• ANN: the ANN tries to imitate human behavior, and its structure consists of neurons and weights between them to transfer messages around the network like humans. Generally, the ANN includes three layers, including input layers which reciters the input data such as signals, images, or any other data type. The hidden layer is responsible for calculation in ANN, receiving data from input layers, and considering the calculation nodes that process the input data. The results of the computation process that occurred in the hidden layers will transfer to the output layers. The last layer is the output layer, and in this layer, the users can see the output of the ANN, which is represented by this layer. So far, the ANN has various applications for solving highly complex computational problems in medical conditions(48–51).
• SVM: The SVM is one of the classification and regression algorithms using the hyperplane concept for performing the classification task. This algorithm uses the mapping process to convert the low dimensional data points to the higher dimensions for classifying the cases, which can be used through the kernel functions. Different kernel types are used for various types of datasets in terms of the complexity of data so that the SVM algorithms can be categorized as linear and non-linear types(52, 53).
• NB: The NB algorithm is a statistical classification based on the Bayesian theory using probabilistic hypothetic concepts. It determines the probability of each sample belonging to different data classes using Eq. 1. The new samples will be classified in one class having the highest probability. In this algorithm, each variable's occurrence is independent of determining the dependent variable(54).
Equation 1:\(\text{P}\left(\text{A}/\text{B}\right)=\frac{\text{P}\left(\text{B}/\text{A}\right)}{\text{P}\left(\text{B}\right)}\left(\text{P}\left(\text{A}\right)\right)\)
• BLR: This algorithm (Eq. 2) uses the statistical method to predict the probability of occurrence of each state of the output class through the input variables. In contrast with the NB, the occurrence of each independent variable is not separate from the other. Therefore, these variables affect each other in predicting the output class and have a mixed correlation. The BLR is adaptable to the dataset with two values of output class, for example, having a disease or not.
Equation 2: \(\text{P}=\frac{{\text{e}}^{\text{B}0+\text{B}1\text{X}1+\text{B}2\text{X}2+\text{B}3\text{X}3+\dots .}}{1+{\text{e}}^{\text{B}0+\text{B}1\text{X}1+\text{B}2\text{X}2+\text{B}3\text{X}3+\dots .}}\)
To get the best model to determine the success among the elderly, we evaluated the performance of selected ML algorithms using the confusion matrix (Table 1) and sensitivity (Eq. 3), specificity (Eq. 4), accuracy (Eq. 5), F-Measure (Eq. 6), and AUC (area under the ROC (Receiver Operator Characteristics) curve) obtained from the confusion matrix. In Table 1, the True Positive (TP) and True Negative (TN) are the successful and unsuccessful cases correctly classified by the decision models. The False Negative (FN) and False Positive (FP) are the successful and unsuccessful adults incorrectly categorized by the algorithms. In this study, 70% and 20%, and 10% of the samples were assigned for training, testing, and validation of the algorithms. The ten-fold cross-validation was considered to measure the errors while evaluating the performance criteria.
Table 1
|
Predicted cases
|
+
|
-
|
Real cases
|
+
|
TP
|
FP
|
|
FN
|
TN
|
Equation 3:\(\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{N}}\)
Equation 4: \(\text{S}\text{p}\text{e}\text{c}\text{i}\text{f}\text{i}\text{c}\text{i}\text{t}\text{y}=\frac{\text{T}\text{N}}{\text{T}\text{N}+\text{F}\text{P}}\)
Equation 5: \(\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}=\frac{\text{T}\text{P}+\text{T}\text{N}}{\text{T}\text{P}+\text{F}\text{N}+\text{F}\text{P}+\text{T}\text{N}!}\)
Equation 6:\(\text{F}-\text{M}\text{e}\text{a}\text{s}\text{u}\text{r}\text{e}=2\text{*}\left(\frac{\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}\text{*}\text{S}\text{p}\text{e}\text{c}\text{i}\text{f}\text{i}\text{c}\text{i}\text{t}\text{y}}{\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}+\text{S}\text{p}\text{e}\text{c}\text{i}\text{f}\text{i}\text{c}\text{i}\text{t}\text{y}}\right)\)