Efficient Feature Extraction Based on Optimized Gated Recurrent Unit Recurrent Neural Network for Automatic Thyroid Prediction

doi:10.21203/rs.3.rs-2319272/v1

Download PDF

Research Article

Efficient Feature Extraction Based on Optimized Gated Recurrent Unit Recurrent Neural Network for Automatic Thyroid Prediction

https://doi.org/10.21203/rs.3.rs-2319272/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this paper, develop Efficient Feature Extraction Based Recurrent Neural Network (EFERNN). Initially, the databases are gathered from the open-source system. After that, the pre-processing technique is developed for correcting missing values by the normalization technique of min-max normalization. The pre-processed data is utilized for feature extraction by using feature extraction techniques such as Two-Level Feature Extraction (TLFE) techniques. In level1, the ranked filter feature set technique is utilized to rank the features based on doctor recommendations. In order to execute the label-driven validation, ranking measures are used. In level 2, features are ranked and selected using a variety of metrics, including info gain, gain ratio, chi-square, and relief. In level 2, the effective features are chosen from the feature set using a fuzzy-based composite measure. In order to categorise thyroid disease from the databases, the Optimized Gated Recurrent Unit - Recurrent Neural Network (GRU-RNN) is used. In the GRU-RNN, the weight is selected with the assistance of the COOT Optimization Algorithm. The suggested method is put into practise in MATLAB, and its effectiveness is assessed by taking into account statistical measurements like kappa, accuracy, precision, recall, sensitivity, specificity and F Measure. To validate the proposed technique, it is compared with conventional techniques such as Deep Belief Neural Network (DBN). Artificial Neural Network (ANN) and Support Vector Machine (SVM).

Automatic thyroid prediction

Recurrent neural network

gated recurrent unit

fuzzy based composite measure

Info gain

Gain ratio

Chi-square

and relief

The thyroid is a sizable organ with a butterfly-like shape. It is placed in the lower part of the neck and helps regulate the body's digestion [1]. This organ produces two dynamic thyroid chemicals, triiodothyronine (shortened T3) and levothyroxine (shortened T4). These substances are crucial for the production of proteins, controlling internal temperature, and managing energy in general [2]. The thyroid organ is prone to many obscure diseases, some of which are particularly normal, such as hypothyroidism and hyperthyroidism [3]. A lack of thyroid chemical secretion origin hypothyroidism, and an overproduction of thyroid chemical secretion causes hyperthyroidism. The former case refers to a condition of hypothyroidism that governs a deficiency or underproduction of thyroid chemicals [4]. Side effects of this condition can affect a person's weight gain, enlarged neck, and low heart rate, although hyperthyroidism refers to an excess of thyroid chemicals by the thyroid gland, in which a person can experience the adverse effects of increased blood flow strain. and heart rate during weight loss [5].

Thyroid issues interfere with the thyroid gland's regular operation and result in aberrant chemical synthesis, which leads to hyperthyroidism [6]. In the developed world, hypothyroidism is thought to affect 4–5% of people. Hypothyroidism, elevated cholesterol levels, expansion of the circulatory strain [7], cardiovascular problems, reduced adolescent development, and depression are among its potential side effects. It is crucial to educate the general population on the symptoms, classifications, and causes of this disease [8]. A commonly used technique to differentiate thyroid problems is the use of a blood test that can measure TSH, T3, and T4 levels [9]. The medical care industry offers a great deal of mind-blowing information in the medical field that is exceptionally hard to do. Several AI approaches are being used to look for and identify late mutated diseases [10]. Various classification techniques are used by experts such as Neural network, K-nearest neighbour (KNN), ANN, decision tree (DT), Naive Bayes, SVM and Bayesian network (BNN), and many more.

Contribution of the work

The primary goal of this study is to demonstrate how the classification of more significant features from the available raw medical dataset can assist the physician in making an accurate thyroid diagnosis for the general population.
Developing a min-max pre-processing technique will help to remove unwanted values and correct the missing values from the input data.
The categorization system will be more accurate if a two-level feature extraction technique is developed.
Developing an optimized RNN-GRU classifier will help to classify the thyroid classes from the input data.

The rest of the article is designed as follows; section 2 lists similar studies on the use of deep learning to detect thyroid disease. The entire architecture of the suggested system model is provided in Section 3. Section 4 provides the outcome evaluation of the projected technique. Finally, the conclusion is presented in section 5.

In recent years, many deep learning techniques are developed by authors for automatic thyroid detection from databases. Few research works are reviewed in this section.

Tahir Alyas et al., [11] have introduced an empirical technique for thyroid infection prediction using the AI approach. In this article, various AI computations such as ANN, KNN, random forest algorithm, and decision tree on the dataset have experimented to predict the disease more easily in light of the boundaries defined in the dataset. Similarly, the dataset is controlled for correct expectations for classification. In order to properly examine the dataset, the classification was done on both sampled and unsampled datasets. After limiting the dataset, we were able to calculate random forest with 94.8% accuracy and 91% specificity, which is the maximum accuracy.

Hafiz Abbad Ur Rehman et al., [12] introduced K-nearest neighbours (KNN) with different distance capabilities for thyroid disease diagnosis. Three levels of the proposed algorithm—KNN without highlight selection, KNN with L1-based selection, and KNN with chi-square-based feature selection procedures—are included. In this review, thyroid datasets from the KEEL Dataset repository and one more from a Pakistani clinic that was advertised were both used. The new dataset could be distinguished from the old datasets because it had three extra elements: pulse rate, body mass index, and blood pressure.

Mehdi Hosseinzadeh et al., [13] proposed a neural network with back-propagation error capability called a multiple multi-layer perceptron (MMLP) for thyroid recognition. The near-minimum complexity of the local minima problem and back-propagation error computation were further managed by an adaptive learning rate computation. The proposed MMLP improved the classification of thyroid diseases overall. An accuracy improvement of 0.7% was made using the 6-entity MMLP in comparison to a stand-alone system.

Rajasekhar Chaganti et al., [14] have introduced a method of examining feature design for AI and deep learning models. Includes robustness employing extra AI-based tree classifiers, bilateral feature decision, forward feature selection, inverse feature removal, and robustness. The suggested method can forecast immune system thyroiditis (compensatory hypothyroidism), enlarged regulatory protein, Hashimoto's thyroiditis (essential hypothyroidism), and non-thyroid disorder (NDIS) (simultaneous non-thyroid disease). Numerous investigations have demonstrated that when employing an arbitrary forest classifier, the selected additive tree classifier produces the best results, with an accuracy of 0.94 and an F1 score.

K. Shankar et al., [15] proposed a productive feature-based multi-kernel SVM technique for identifying thyroid illness. The proposed model was to order this thyroid information using fine component determination and a region-based classifier procedure. Here, the arrangements of information sequence using "Multi Kernel Support Vector Machine" are sampled and its collection program. The interest and purpose of this proposed model were to improve the presentation of the characterization process with the help of further developed dark wolf development as element selection. The justification for this finer element selection was that it computationally increases the representation of non-significant features and models from exceptional datasets.

In India, one in ten persons has hypothyroidism, according to a recent article published by the daily Seasons in India. Based on research done by the Thyroid Association of India, this estimate was made. When compared to other prevalent conditions including asthma, obesity, diabetes, heart disease, and sleep disorders, concentration ranks tenth and also represents memory for thyroid disease. Thyroid side effects are similar to other disorders, according to medical professionals. Nevertheless, half of the study population knew about thyroid disorders; Realize that symptom tests are available to diagnose this infection. This situation became an impetus for developing study models for the thyroid disease group with extraordinary accuracy. The major goal of this research project is to create an effective and efficient system for the management of thyroid disease utilising various preparation processes and to identify the group with the best results. AI has become a remarkable living soul that provides knowledge and reasoned answers to various problems. Therefore, medicine attracts the eyes of many scientists because society depends on sound and efficient people for its fair work. It is clear that a sick person will invest a lot of his energy in worrying about his well-being, and therefore does not leave useful time to complete the duties performed, not to mention performing well. Hence, this paper, develop EFERNN for semiautomatic sensing and categorization of thyroid disease from the databases. Figure 1 shows the block diagram of the suggested approach.

Initially, the databases are gathered from the open-source system. After that, the pre-processing technique is developed for correcting missing values by the normalization technique. The pre-processed data is utilized for feature extraction by using feature extraction techniques such as TLFE techniques. In level 1, the ranked filter feature set technique is utilized to rank the features based on doctor recommendations. In order to execute the label-driven validation, ranking measures are used. In level 2, features are ranked and selected using a variety of metrics, including info gain, gain ratio, chi-square, and relief-F. In level 2, the effective features are chosen from the feature set using a fuzzy-based composite measure. Finally, thyroid illness is categorised using the databases using the GRU-RNN. With the help of the COOT Optimization Algorithm, the weight is chosen in the GRU-RNN.

4.1. Pre-processing

Normalization is used at the pre-processing step to clean up extraneous data and background noise from the gathered databases. The Min-Max normalisation technique is taken into consideration to normalise the data [16]. This technique is given linear transformation on the normal period of data. This method manages the relationship between normal data. It is a simple technique that specifically fits the data into a pre-established border. The min-max normalization technique is presented below,

$$X=\left(\frac{X-MIN parameter of X}{MAX parameter of X-MIN parameter of X}\right)*\left(d-c\right)+c \left(1\right)$$

Here, $X$ can be defined as an input database, $d,c$ can be defined as a pre-defined boundary. Based on the min-max normalization technique, the data is normalized. The normalized data is sent to the feature extraction technique.

4.2. Feature extraction ranked filter feature set technique (level 1)

This study is designed to detect the category and existence of thyroid disease in a patient. The collected database is consisting of 29 various features such as query information, gender, age, various test measures, symptoms, patient information, and physiological features. The query information feature gathers the patient history related to the previous infections with hyperthyroid, hypothyroid, antithyroid medication, and thyroxine. The precise information is gathered to create high- and low-level symptoms. The low-level symptoms contain the dimension of lithium, infection of 1131, and goitre. The high-level symptoms contain physiological variations detected in the patient, pregnancy, and sickness. The general patient data are shown to contain a negligible desire to diagnose the condition by consultation with a doctor and analysis of a deep thyroid examination [17]. Table 1 lists the low-level and high-level searches, clinical features with the relevant clinician, and useful recommendations. This table showed that thyroid hormones are significantly influenced by lithium and 1131 illness. This doctor's recommendation can be analysed as the weighted factor in addition to utilized feature matrices to create the ranked filtered feature pairs.

Table 1

Level 1 Feature extraction based on doctor recommendations (symptoms)
Doctor advice	Description/Effect	Features
Low	Affects emotionally, physically, and mentally. Varied to detect it is due to thyroid	Psych
Low	Sickness exists in thyroid issues, it cannot a single reason	Sick
High	This is affecting the thyroid operation. It can design hyperthyroidism, hypothyroidism, and goitre	lithium
Medium	Growth, affect blood pressure and rare disorder. Outcomes hypopituitarism	hypopitutitar
Low	Lesser visible symptoms, in the later age of a specific and less presented cancer	Tumor
Medium	Thyroid gland enlargement	Goiter
Low	Variations in size and hormones during pregnancy	Pregnancy
Low	Risk complication is less than 2 percent	Thyroid surgery
High	Destroy thyroid cells	1131

Thyroid disease cannot be identified by physiological changes such as blood pressure, hair loss, or weakness. The doctor can manage many blood tests like FTI, T3, TSH, T4, and TBG to examine low-level evaluations and discover thyroid problems. The type and abnormalities of thyroid illness are found using this blood test. Table 2 lists the abnormality influence, related symptoms, kind of disorder, and abnormality influence.

Table 2

Level 1 Feature extraction based on doctor recommendations (test)
Doctor advice	Symptoms	Features
Low	Sensitivity increased, hair loss, fatigue, and dryness	TBG (thuroxine binding globulin)
High	-	FT1 (free T4)
High	Blood test	TSH
Medium	dryness, hair loss, sensitivity increased, weight loss, fatigue, and weakness	T3 Test
High	Blood pressure, hair loss, eye issue, and dryness	TT4 test

The above table explains the functional characteristics in addition to its scope related to low and high abnormality. This kind of test is utilized to detect various types of thyroid disease. Based on the doctor's recommendations of tests and symptoms-based level 1 feature are extracted for the detection of the thyroid. The extracted features are sent to the feature ranking and selection measure with level 2 for selecting essential features.

4.3. Feature ranking and selection measures level 2 (Fuzzy based composite measure)

Info gain

It is a ranking technique utilized to analyse the feature related to information theory in addition it's connected to the respective class [18]. The identified combined evaluation metric as data achieved from the training set is presented as follows,

$$iv\varnothing o\left({\Sigma }\right)=-\sum _{J=1}^{k}P\left({A}_{J}\right){Log}_{2}P\left({A}_{J}\right) \left(2\right)$$

Here, $P\left({A}_{J}\right)$ can be defined as a quantity derivation of a count of variables of a specific sample set to class ${A}_{J}$, $A$ can be defined as a pair of classes and $k$ can be defined as a count of classes of thyroid disease disorder. The data collected is then used to assign particular independent attributes in the ranking process. One can select the higher features as the high-value features. Information gain parameters are attained for several thyroid dataset features. The properties of the proposed model can then be improved using the fuzzy-based composite measure.

ReliefF

It is an effective feature selector for binary classification problems. The links between the features may be important to the filter. For feature scoring, the filter computes the closest neighbour variations. It creates miss and hit-based computations related to class specification. This filter is utilized to validate the weighted score.

Chi-Square

It is a measure of two parameters. This coefficient is applied to compute the similarities.

$${\varkappa }^{2}=\sum _{I=1}^{K}\sum _{J=1}^{C}\frac{{\left({N}_{IJ}-{\mu }_{IJ}\right)}^{2}}{{\mu }_{IJ}} \left(3\right)$$

Here, ${N}_{IJ}$ can be defined as the count of samples, $K$can be defined as various parameters and $C$ can be defined as several classes.

Gain ratio

It is an enhanced metric of information gain which is utilized in the normalized information. It is capable of decreasing the biasness with the basis of higher attributes.

$$SInfo=-\sum _{I=1}^{V}\frac{\left|{d}_{I}\right|}{d}\times {Log}_{2}\left(\frac{\left|{d}_{I}\right|}{d}\right) \left(4\right)$$

Here, d can be defined as a data parameter and $V$ can be defined as an attribute parameter. The gain ratio is presented as follows,

$$GR=\frac{Gain}{SInfo} \left(5\right)$$

In this research, composite-based two-level feature extraction with the consideration of fuzzy rules is developed for detecting the most contributing features. Based on the above measures, the significance of features is identified. The fuzzy rules are generated for selecting the significant features from the different measures which are presented in the below equations,

$$INFOgain=\begin{array}{cc}INFOgain\ge a& Significant\\ INFOgain<a& Nn significant\end{array} a=0.005 \left(6\right)$$

$$GAIN ratio=\begin{array}{cc}GAIN ratio\ge a& Significant\\ GAIN ratio<a& Nn significant\end{array} a=0.2 \left(7\right)$$

$$Chi Square=\begin{array}{cc}Chi Square\ge a& Significant\\ Chi Square<a& Nn significant\end{array} a=50 \left(8\right)$$

Here, $a$ can be defined as an Optimized fuzzy decision parameter. The combined fuzzy measure composite-based feature selection formulation is presented as follows,

$$CfMeasure=\begin{array}{cc}(sg\cap sg\cap sg\cap sg)\cup (nsg\cap sg\cap sg\cap sg)& High\\ (nsg\cap sg\cap sg\cap sg)& Medium \\ otherwise& Low\end{array} \left(9\right)$$

Based on the above formulation, the features are ranked and selected for thyroid detection and classification. The selected features are sent to the final classifier for detecting thyroid from the collected databases. The complete architecture of the proposed classifier is presented in the below section.

4.4. GRU-RNN with COOT Optimization Algorithm

In the proposed system, the GRU-RNN is utilized to identify thyroid disease from the databases. Normally, the dataset is utilized to train the network and test the network based on the network architecture. In the GRU-RNN, the optimal weighting parameter is selected with the assistance of the COOT algorithm. The detailed explanation of the GRU-RNN and COOT algorithms is explained in this portion.

4.4.1. GRU-RNN

In the RNN, neurons in a similar layer are utilized to transmit data to other neurons compared with the conventional neural network. Hence, RNN is defined as superior. RNN is operated related to the time sequence and designing it is an advantageous technique for operating time series functionalities [19]. Figure 2 depicts the recurrent neural network's design architecture.

From the above figure, $W$ can be defined as hidden-hidden weight matrices, $V$ can be defined as hidden-output weight matrices, $U$ can be defined as the input-hidden weight matrices, $Y$ can be defined as the predicted outcome, $S$ can be defined as hidden state and $X$ can be defined as input. From the RNN time series model, the characteristics and state of the network can be validated in measure of time. Combination of previous time state ${S}_{t-1}$ and present input ${X}_{i}$ is utilized to compute the neuron state $S$ with the specified period $t$ which is computed as follows,

$${S}_{t}=M\left({UX}_{t-1}+{WS}_{t-1}+{B}_{H}\right) \left(10\right)$$

Here, ${B}_{H}$ can be defined as a bias term, $M$ can be defined as an activation function. The neuron state is utilized as outcomes in the specified time $t$ and the network state input in the next time $t+1$ at a similar period.

Hence, ${S}_{t}$ does not connect directly to the output, it is required to be multiplied with the coefficient Z, and after that added with the offset function. This procedure can be defined with the below mathematical formulation,

$${Y}_{t}=ACT\left(Z{S}_{t}+{B}_{Y}\right) \left(11\right)$$

Here, ${B}_{Y}$ can be defined as a bias parameter and $ACT$ can be defined as the activation function. A GRU can be described from LSTM it does not contain an output gate. One gate contains both the input gate and the forget gate. Additionally, it unites the cell phase's concealed state into one phase. So, the GRU is very simple when compared with the LSTM and becomes preferable because of its simplicity and faster training state. The GRU cell architecture is illustrated in Fig. 3. The presented hidden state $h\left(K\right)$ can be computed as in this section.

If the data of the previous input parameter or hidden state is required to be discarded, after that the reset gate $r\left(K\right)$ can be utilized. The count of data which required to be saved and sent to the next process is managed by the update gate $z\left(k\right)$. The unnecessary data from the last phase can be forgotten by multiplying the reset dates output of the last phase. In remaining words, if the output of the update gate $Z$ can be near zero, the present state will consist of more novel data [20]. Moreover, if the output of the update gate $Z$ is near one, the present data is saved from the last period of iteration. The below computations formulate the details described above in the specified sampling period $k$,

$$h\left(k\right)=\left({1}_{n}N\times 1-z\left(k\right))\times g(k)+z(k)\times h(k-1)\right) \left(12\right)$$

$$g\left(k\right)-tanh\left({w}_{g}x\left(k\right)+z\left(k\right)\times {R}_{g}h\left(k-1\right)+{b}_{g}\right) \left(13\right)$$

$$z\left(k\right)=\sigma \left({w}_{z}x\left(k\right)+{R}_{z}h\left(k-1\right)+{b}_{z}\right) \left(14\right)$$

$$r\left(k\right)=\sigma \left({w}_{r}x\left(k\right)+{R}_{r}h\left(k-1\right)+{b}_{r}\right) \left(15\right)$$

Here, $R$ and $W$ can be defined as learned weight matrices, $\times$ can be defined as element-wise multiplication, $\sigma$ can be defined as a logistic sigmoid function, $h$ can be defined as candidate activation, $g$ can be defined as activation function, $z$ can be defined as update gate and $r$ can be defined as reset gate. The GRU-RNN was created to predict thyroid illness. The GRU module unit that obtains the information in the thyroid data can be the most crucial design in the GRU-RNN network model. The COOT optimization algorithm aids in the selection of the weighting parameter in the GRU-RNN. The section below provides a thorough explanation of the COOT algorithm.

4.4.2. COOT Optimization Algorithm

In the GRU-RNN, the weight parameter is selected with the assistance of the COOT optimization. The random weight updating process of the GRU-RNN maybe reduce the accuracy level of the system. Hence, the COOT optimization is utilized to select the efficient weight parameter which enables a high accuracy level in the identification of thyroid disease problems. A detailed explanation of the COOT optimization is presented in this section [21].

The Coots can be a small water bird that is a member of the rail family and it is named Rallidae. The name Fulica, which refers to this kind of bird, comes from the Latin for coot. Coot can have various collective characteristics, the main objective is to simulate gathering movements. The complete cluster can be directed towards the final target by a few coots in front of the cluster which considers cluster leaders. Hence, four various coot moves of the water surface are considered which are presented as follows,

Random variation to both sides
Chain variations
Managing the position related to cluster leaders.
Improving the cluster through the leaders to the optimal location.

Mathematical model of the algorithm

The normal design of the complete optimization algorithm is similar to another meta-heuristic algorithm. The positions of the coot are biased and the weight of GRU-RNN networks. The algorithm initiates with an initial random population which is presented as follows,

$$\left(\overrightarrow{X}\right)=\left\{\overrightarrow{{X}_{1}, }\overrightarrow{{X}_{2}, }\dots .,\overrightarrow{{X}_{N} }\right\} \left(16\right)$$

The random population can be continuously validated with the consideration of the final function in addition a final value is computed as follows,

$$\left(\overrightarrow{O}\right)=\left\{{O}_{1},{O}_{2},\dots ,{O}_{N}\right\} \left(17\right)$$

The set of guidelines that make up an optimization method's core can be used to improve it. There is no assurance that a solution will be computed in one iteration due to population-related optimization methods searching for the ideal number of optimization problems. The likelihood of computing the global optimal also rises with necessary numbers of random solution and optimization phases. By using the formulation below, the population can be generated at random in a beautiful area,

$$CootPOS\left(I\right)=RAND\left(1,D\right).*\left(UB-LB\right)+LB \left(18\right)$$

Here, $UB$ and $LB$ can be defined as an upper and lower bound of search space, $D$ can be defined as problem variables or number of variables, $CootPOS\left(I\right)$ is defined as coot position. Every parameter may contain various upper and lower-bound problems,

$$UB=\left[{UB}_{1},{UB}_{2},\dots ,{UB}_{D}\right],LB=\left[{LB}_{1},{LB}_{2},\dots ,{LB}_{D}\right] \left(19\right)$$

The position of the search agent should be determined after constructing the starting population, and each solution's fitness should be calculated using,

$${O}_{I}=f\left(\overrightarrow{X}\right) \left(20\right)$$

This equation is an objective function. The cluster leader in this case should be the NL number of coots. Leaders are chosen at random. The method is updated based on the coots' four motions.

Fitness evaluation

For each coot, the fitness function parameters can be computed. The fitness function parameters are computed related to the difference among the detection and its related observations with the consideration of the formula,

$${MSE}_{J}=\frac{1}{N}\sum _{T=1}^{N}{\left({x}_{T}-\underset{{x}_{T}}{⏞}\right)}^{2}, J=\text{1,2},..,pn \left(21\right)$$

Random changes to both sides

Consider a random place in the search space that is connected to the following formula to examine these changes, then move the coot in that direction,

$$q=RAND\left(1,D\right).*\left(UB-LB\right)+LB \left(22\right)$$

This coot variant looks into different areas of the search space. These modifications will make the technique escape from the local optimal if they have an impact on the local optimal. Coots's novel position can be computed based on the below formulation,

$$CootPOS\left(I\right)=CootPOS\left(I\right)+a\times {r}_{2}\times \left(q-CootPOS\left(I\right)\right) \left(23\right)$$

Here, $a$ can be defined as a random variable, ${r}_{2}$ can be defined as a random number in the specified range [0,1].

$$a=1-l\times \left(\frac{1}{Iteration}\right) \left(24\right)$$

Here, $Iteration$ can be defined as maximum iteration and $l$ can be defined as current iteration.

Chain variation

Chain variation is carried out using the dual coots' average location. Another method for verifying a chain modification is to first calculate the distance between two coots, and then transfer one of them to the other with about half of the distance [22]. Here, using the original method and the newly discovered location of the coot, the following formula can be used to compute,

$$CootPOS\left(I\right)=0.5\times (CootPOS\left(I-1\right)+CootPOS\left(I\right) (25)$$

Here, the second coot is represented as $CootPOS\left(I-1\right)$.

Managing the position related to the cluster leaders

The cluster is typically led by a few coots who stand in front of it. The other coots must regulate their proximity to the cluster leaders and migrate toward them. The remaining coots have changed their positions in relation to the dominant one. Additionally, the coots improve their position in relation to the average position of the leaders, which is also taken into consideration. Considering the average positions creates the problem of convergence. To compute these variations, the below formula is presented as follows,

$$k=1+\left(I mod NL\right) \left(26\right)$$

Here, $k$ can be defined as the leader index number, $NL$ can be defined as the count of leaders and $I$ can be defined as the index number of the present coot. The coot should upgrade its location related to leaders $k$. Based on the selected leader, the coot's future location is calculated.

$$CootPOS\left(I\right)=LeaderPOS\left(K\right)+2\times {r}_{1}\times cos\left(2r\pi \right)\times (LeaderPOS\left(K\right)-CootPOS\left(I\right) (27)$$

Here, $r$ can be defined as the arbitrary number in the period [-1,1], $\pi$ can be defined as the pi parameter as 3.14, ${r}_{1}$ can be defined as a random number in the interval [0,1], $LeaderPOS\left(K\right)$ can be defined as chosen leader position and $CootPOS\left(I\right)$ can be defined as the present position of coot.

Algorithm 1: pseudo code of the proposed algorithm

Initialize the first population of coot’s weight and biases of GRU-RNN

Initialize the variables P = 0.5 and NL

Number of coots

Random selection of coots as leaders

Determine the leaders' and coots' fitness

Calculate the global optimum, or the best leader or coot.

Condition is satisfied

Compute a and b variables

If$RAND<P$

$r,r1 and r3$ can be random vectors

Else

$r,r1 and r3$ are random variables

End

For $I=1$ to the coot numbers

compute the variable of k

$IF RANDOM>0.5$

Update the position

Else

$IF RANDM<0.5$

Update the position

Else

Update the position

End

Calculate the coot fitness

If the coot's fitness is greater than the leader's,

Coot = Temp, coot = leader(k), Leader(k) = Temp

End

For the number of leaders

IF RAND < 0.5

Update the position of the leader

Else

Update the position of the leader

End

If the fitness of$gbest> leader$

Save the best parameter

End

$Iteration=Iteration +1$

End

Improving the cluster through the leaders the optimal location

The cluster should be directed to an optimal location, so leaders are required to upgrade their location to the goal. It the recommended upgrading the location of leaders. The formulation looks for the best locations around this present optimal rate. Leaders can compute the best places by moving away from the current optimal location. The finest method for obtaining the ideal position and escaping from it is provided by this formula.

$$LeaderPOS\left(I\right)=\left\{\begin{array}{cc}b\times r3\times cos\left(2\pi r\right)\times \left(gbest-LeaderPOS\left(I\right)\right)+gbest& r3<0.5\\ b\times r3\times cos\left(2\pi r\right)\times \left(gbest-LeaderPOS\left(I\right)\right)-gbest& r4\ge 0.5\end{array}\right\} \left(28\right)$$

Here, $r$ can be defined as the interval [-1,1], $\pi$ can be defined as the pi parameter i.e., 3.14, $r3,$ and $r4$ is defined as a random variable in the interval [0,1] and $gbest$ is defined as best position ever found.

$$b=2-l\times \left(\frac{1}{Iteration}\right) \left(29\right)$$

Here, $Iteration$ can be defined as maximum iteration and $l$ can be defined as present iteration. With the assistance of the coot algorithm, the optimal weighting parameter of the proposed classifier is achieved. Finally, the proposed classifier is utilized for the identification of thyroid from the databases.

In this section the effectiveness of the suggested automatic thyroid detection is evaluated. The standard database is gathered from the open-source system in order to validate the suggested thyroid detection. The three classes and 7201 databases are both included in the gathered database. 80% of the data in the database is used to train the network. The remaining 20% of the data is additionally used to test the network. The performance of the suggested technique is assessed using MATLAB and statistical measurements including accuracy, precision, recall, specificity, kappa, F Measure, AUC, ROC, and sensitivity. The proposed method is put to the test by being put up against more established methods like ANN, SVM, and DBNN.The simulation parameters of the projected technique are presented in Table 4. The projected technique is validated by utilizing statistical measures with the consideration of the confusion matrix which is illustrated in Table 5.

Table 4

Simulation parameters
S. No	Variable name	Parameters
1	Number of search agents	2
2	lower bound	200
3	Upper bound	500
4	Dimension	10
5	Maximum iteration	100
6	Dropout rate	0.1
7	Learning rate	0.001
8	Regularization parameter	0.01
9	Activation function	ReLU
10	Loss function	MSE
11	Batch size	50
12	Epoch	50
13	Number of neurons	32 for the GRU layer

Table 5

Confusion matrix
N = 7201		Predicted
Actual		Class-1	Class − 2	Class − 3
	Class- 1	2995	2	3
	Class − 2	3	2798	2
	Class- 3	2	2	1401

Table 6

Feature evaluation using the proposed technique
S. No	Features	Info gain	Gain ratio	Chisq
1	On thyroxine	0.0029	0.0001	0.529
2	Query_on_thyroxine	0.0011	0.0006	5.5118
3	Query_on_thyroxine	0.0008	0.0087	6.6543
4	On_antithyroid_medication	0.0993	0.0359	7556.4196
5	Sick	0.0065	0.0742	922.4819
6	Pregnant	0.0388	0.0152	2315.9042
7	Thyroid surgery	0.0065	0.0633	3248.1218
8	I131_treatment	0.1293	0.0441	249.9336
9	Query hypothyroid	0.0123	0.0698	76.7466
10	Query hyperthyroid	0.0642	0.1231	1485.5599
11	Goitre	0.1680	0.0261	71.4091
12	Tumor	0.2365	0.0224	587.2441
13	Hypopituitary	0.0203	0.1554	71.7620
14	Psych	0.0084	0.0205	1891.5441
15	TSH	0.0993	0.0105	384.4196
16	T3	0.0874	0.0854	124.254
17	TT4	0.3548	0.0487	68.548
18	T4U	0.1548	0.0687	125.654
19	FTI	0.0485	0.0154	55.654

Table 7

Feature selection using the proposed technique
S. No	Features	Info gain	Gain ratio	Chisq	Selected
1	On thyroxine	H	L	H	YES
2	Query_on_thyroxine	M	L	H	YES
3	Query_on_thyroxine	H	L	H	YES
4	On_antithyroid_medication	H	H	M	YES
5	Sick	H	L	H	YES
6	Pregnant	L	L	M	NO
7	Thyroid surgery	L	M	L	NO
8	I131_treatment	H	H	H	YES
9	Query hypothyroid	L	L	L	NO
10	Query hyperthyroid	L	M	M	YES
11	Goitre	L	L	L	NO
12	Tumor	H	H	H	YES
13	Hypopituitary	L	M	L	NO
14	Psych	L	L	M	NO
15	TSH	H	L	H	YES
16	T3	L	L	L	NO
17	TT4	H	H	L	YES
18	T4U	L	M	H	YES
19	FTI	H	H	H	YES

The proposed methodology uses two-level feature extraction techniques to identify the key features while taking doctor recommendations into account. The features are also chosen based on other metrics, including info gain, gain ratio, and chi-square measure. Tables 6 and 7 exhibit the retrieved characteristics and chosen features based on the measure. Performance indicators including accuracy, sensitivity, specificity, F measure, kappa, ROC, AUC, precision, and recall are used to evaluate how well the given approach performs.

To justify the presentation of the proposed thyroid detection model, accuracy is used. This accuracy measure is given in Fig. 5. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.998 accuracies. Additionally, the DBN, ANN, and SVM have achieved 0.95, 0.91, and 0.909. From Fig. 5, the projected technique attained the best outcomes. To justify the presentation of the proposed thyroid detection model, the F_measure is used. This F_measure measure is given in Fig. 6. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.94 F_measure. Additionally, the DBN, ANN, and SVM have achieved 0.915, 0.91, and 0.89. From Fig. 6, the projected technique is attaining the best outcomes. To justify the presentation of the proposed thyroid detection model, precision is used. This precision measure is given in Fig. 7. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.93 precision. Additionally, the DBN, ANN, and SVM have achieved 0.928, 0.918, and 0.905. From Fig. 7, the projected technique is attaining the best outcomes.

Table 8

Comparison analysis
S. No	References	Method	Accuracy	Precision
1	Tahir Alyas et al., [11]	empirical technique	0.948	0.89
2	Hafiz Abbad Ur Rehman et al., [12]	KNN	0.91	0.91
3	Mehdi Hosseinzadeh et al., [13]	multiple multi-layer perceptron (MMLP)	0.7	0.85
4	Rajasekhar Chaganti et al., [14]	deep learning method	0.94	0.92
5	K. Shankar et al., [15]	multi-kernel SVM approach	0.97	0.90
6	-	Proposed approach	0.998	0.93

To justify the presentation of the proposed thyroid detection model, recall is used. This precision measure is given in Fig. 8. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.91 recall. Additionally, the DBN, ANN, and SVM have achieved 0.908, 0.88, and 0.87. From Fig. 8, the projected technique is attaining the best outcomes. To justify the presentation of the proposed thyroid detection model, sensitivity is used. This sensitivity measure is given in Fig. 9. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.99 sensitivity. Additionally, the DBN, ANN, and SVM have achieved 0.918, 0.89, and 0.88. From Fig. 9, the projected technique is attaining the best outcomes. To justify the presentation of the proposed thyroid detection model, specificity is used. This specificity measure is given in Fig. 10. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.95 specificity. Additionally, the DBN, ANN, and SVM have achieved 0.94, 0.92, and 0.885. From Fig. 9, the projected technique is attaining the best outcomes. To justify the presentation of the proposed thyroid detection model, the kappa is used. This kappa measure is given in Fig. 11. The projected method is contrasted with traditional techniques such as DBN, ANN, and SVM. The projected technique has achieved 0.93 kappas. Additionally, the DBN, ANN, and SVM have been achieved at 0.91, 0.918, and 0.889. From Fig. 11, the projected technique is attaining the best outcomes.

In this paper, EFERNN has been developed for automatic thyroid detection. Initially, the databases have been gathered from the open-source system. After that, the pre-processing technique has been developed for correcting missing values by the normalization technique of min-max normalization. The pre-processed data has been utilized for feature extraction by using feature extraction techniques such as TLFE techniques. In level 1, the ranked filter feature set technique has been utilized to rank the features based on doctor recommendations. The label-driven validation has been performed by utilizing ranking measures. In level 2, feature ranking and selection measures have been considered to feature ranking by different measures such as info gain, gain ratio, chi-square, and relief. In level 2, the effective features are chosen from the feature set using a fuzzy-based composite measure. Finally, the GRU-RNN has been utilized to classify thyroid disease from the databases. With the help of the COOT Optimization Algorithm, the weight is chosen in the GRU-RNN. The performance of the suggested method has been assessed using MATLAB and statistical measurements like accuracy, precision, recall, specificity, kappa, F Measure, and sensitivity. To validate the proposed technique, it is compared with the conventional techniques such as ANN, SVM, and DBNN.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the Work reported in this paper.

Acknowledgments

The authors thank the anonymous reviewers for their helpful comments and suggestions.

Funding

Not applicable

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent to publish

Not applicable.

Conflict of interest

The authors declare no competing interests.

Data Availability

No data availability

Abbad Ur Rehman, Hafiz, Chyi-Yeu Lin, Zohaib Mushtaq, and Shun-Feng Su. "Performance analysis of machine learning algorithms for thyroid disease." Arabian Journal for Science and Engineering 46, no. 10 (2021): 9437–9449.
Suresh Kumar, Vidhushavarshini, Sathiyabhama Balasubramanian, Vijayakumar Ravi, and Ajay Arunachalam. "A hybrid optimization algorithm-based feature selection for thyroid disease classifier with rough type‐2 fuzzy support vector machine." Expert Systems 39, no. 1 (2022): e12811.
Abbad Ur Rehman, Hafiz, Chyi-Yeu Lin, and Zohaib Mushtaq. "Effective K-Nearest Neighbour Algorithms Performance Analysis of Thyroid Disease." Journal of the Chinese Institute of Engineers 44, no. 1 (2021): 77–87.
Chaubey, Gyanendra, Dhananjay Bisen, Siddharth Arjaria, and Vibhash Yadav. "Thyroid disease prediction using machine learning approaches." National Academy Science Letters 44, no. 3 (2021): 233–238.
Li, Wenjun, Siyi Cheng, Kai Qian, Keqiang Yue, and Hao Liu. "Automatic recognition and classification system of thyroid nodules in CT images based on CNN." Computational Intelligence and Neuroscience 2021 (2021).
Mishra, Sushruta, Yeshihareg Tadesse, Anuttam Dash, Lambodar Jena, and Piyush Ranjan. "Thyroid disorder analysis using random forest classifier." In Intelligent and cloud computing, pp. 385–390. Springer, Singapore, 2021.
Yadav, Dhyan Chandra, and Saurabh Pal. "Prediction of thyroid disease using decision tree ensemble method." Human-Intelligent Systems Integration 2, no. 1 (2020): 89–95.
Ahmad, Waheed, Ayaz Ahmad, Chuncheng Lu, Barkat Ali Khoso, and Lican Huang. "A novel hybrid decision support system for thyroid disease forecasting." Soft Computing 22, no. 16 (2018): 5377–5383.
Poudel, Prabal, Alfredo Illanes, Elmer JG Ataide, Nazila Esmaeili, Sathish Balakrishnan, and Michael Friebe. "Thyroid ultrasound texture classification using autoregressive features in conjunction with machine learning approaches." IEEE Access 7 (2019): 79354–79365.
Yadav, Dhyan Chandra, and Saurabh Pal. "Prediction of thyroid disease using decision tree ensemble method." Human-Intelligent Systems Integration 2, no. 1 (2020): 89–95.
Alyas, Tahir, Muhammad Hamid, Khalid Alissa, Tauqeer Faiz, Nadia Tabassum, and Aqeel Ahmad. "Empirical Method for Thyroid Disease Classification Using a Machine Learning Approach." BioMed Research International 2022 (2022).
Abbad Ur Rehman, Hafiz, Chyi-Yeu Lin, and Zohaib Mushtaq. "Effective K-Nearest Neighbour Algorithms Performance Analysis of Thyroid Disease." Journal of the Chinese Institute of Engineers 44, no. 1 (2021): 77–87.
Hosseinzadeh, Mehdi, Omed Hassan Ahmed, Marwan Yassin Ghafour, Fatemeh Safara, Saqib Ali, Bay Vo, and Hsiu-Sen Chiang. "A multiple multilayer perceptron neural network with an adaptive learning algorithm for thyroid disease diagnosis in the internet of medical things." The Journal of Supercomputing 77, no. 4 (2021): 3616–3637.
Chaganti, Rajasekhar, Furqan Rustam, Isabel De La Torre Díez, Juan Luis Vidal Mazón, Carmen Lili Rodríguez, and Imran Ashraf. "Thyroid Disease Prediction Using Selective Features and Machine Learning Techniques." Cancers 14, no. 16 (2022): 3914.
Shankar, K., S. K. Lakshmana Prabu, Deepak Gupta, Andino Maseleno, and Victor Hugo C. De Albuquerque. "Optimal feature-based multi-kernel SVM approach for thyroid disease classification." The journal of supercomputing 76, no. 2 (2020): 1128–1143.
Patro, S., and Kishore Kumar Sahu. "Normalization: A pre-processing stage." arXiv preprint arXiv:1503.06462 (2015).
Juneja, Kapil. "Expanded and Filtered Features Based ELM Model for Thyroid Disease Classification." Wireless Personal Communications (2022): 1–38.
Juneja, Kapil. "Expanded and Filtered Features." Information Technology 95, no. 9 (2017): 1996–2005.
Alrasheedi, Abdullah, and Abdulaziz Almalaq. "Hybrid Deep Learning Applied on Saudi Smart Grids for Short-Term Load Forecasting." Mathematics 10, no. 15 (2022): 2666.
Aldallal, Ammar. "Toward Efficient Intrusion Detection System Using Hybrid Deep Learning Approach." Symmetry 14, no. 9 (2022): 1916.
Naruei, Iraj, and Farshid Keynia. "A new optimization method based on COOT bird natural life model." Expert Systems with Applications 183 (2021): 115352.
Huang, Yihui, Jing Zhang, Wei Wei, Tao Qin, Yuancheng Fan, Xuemei Luo, and Jing Yang. "Research on Coverage Optimization in a WSN Based on an Improved COOT Bird Algorithm." Sensors 22, no. 9 (2022): 3383.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Efficient Feature Extraction Based on Optimized Gated Recurrent Unit Recurrent Neural Network for Automatic Thyroid Prediction

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Works

3. Proposed System Architecture

4.1. Pre-processing

4.2. Feature extraction ranked filter feature set technique (level 1)

4.3. Feature ranking and selection measures level 2 (Fuzzy based composite measure)

4.4. GRU-RNN with COOT Optimization Algorithm

4.4.1. GRU-RNN

4.4.2. COOT Optimization Algorithm

4. Performance Evaluation

5. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1