In the proposed system, the GRURNN is utilized to identify thyroid disease from the databases. Normally, the dataset is utilized to train the network and test the network based on the network architecture. In the GRURNN, the optimal weighting parameter is selected with the assistance of the COOT algorithm. The detailed explanation of the GRURNN and COOT algorithms is explained in this portion.
4.4.1. GRURNN
In the RNN, neurons in a similar layer are utilized to transmit data to other neurons compared with the conventional neural network. Hence, RNN is defined as superior. RNN is operated related to the time sequence and designing it is an advantageous technique for operating time series functionalities [19]. Figure 2 depicts the recurrent neural network's design architecture.
From the above figure, \(W\) can be defined as hiddenhidden weight matrices, \(V\) can be defined as hiddenoutput weight matrices, \(U\) can be defined as the inputhidden weight matrices, \(Y\) can be defined as the predicted outcome, \(S\) can be defined as hidden state and \(X\) can be defined as input. From the RNN time series model, the characteristics and state of the network can be validated in measure of time. Combination of previous time state \({S}_{t1}\) and present input \({X}_{i}\) is utilized to compute the neuron state \(S\) with the specified period \(t\) which is computed as follows,
$${S}_{t}=M\left({UX}_{t1}+{WS}_{t1}+{B}_{H}\right) \left(10\right)$$
Here, \({B}_{H}\) can be defined as a bias term, \(M\) can be defined as an activation function. The neuron state is utilized as outcomes in the specified time \(t\) and the network state input in the next time \(t+1\) at a similar period.
Hence, \({S}_{t}\) does not connect directly to the output, it is required to be multiplied with the coefficient Z, and after that added with the offset function. This procedure can be defined with the below mathematical formulation,
$${Y}_{t}=ACT\left(Z{S}_{t}+{B}_{Y}\right) \left(11\right)$$
Here, \({B}_{Y}\) can be defined as a bias parameter and \(ACT\) can be defined as the activation function. A GRU can be described from LSTM it does not contain an output gate. One gate contains both the input gate and the forget gate. Additionally, it unites the cell phase's concealed state into one phase. So, the GRU is very simple when compared with the LSTM and becomes preferable because of its simplicity and faster training state. The GRU cell architecture is illustrated in Fig. 3. The presented hidden state \(h\left(K\right)\) can be computed as in this section.
If the data of the previous input parameter or hidden state is required to be discarded, after that the reset gate \(r\left(K\right)\) can be utilized. The count of data which required to be saved and sent to the next process is managed by the update gate \(z\left(k\right)\). The unnecessary data from the last phase can be forgotten by multiplying the reset dates output of the last phase. In remaining words, if the output of the update gate \(Z\) can be near zero, the present state will consist of more novel data [20]. Moreover, if the output of the update gate \(Z\) is near one, the present data is saved from the last period of iteration. The below computations formulate the details described above in the specified sampling period \(k\),
$$h\left(k\right)=\left({1}_{n}N\times 1z\left(k\right))\times g(k)+z(k)\times h(k1)\right) \left(12\right)$$
$$g\left(k\right)tanh\left({w}_{g}x\left(k\right)+z\left(k\right)\times {R}_{g}h\left(k1\right)+{b}_{g}\right) \left(13\right)$$
$$z\left(k\right)=\sigma \left({w}_{z}x\left(k\right)+{R}_{z}h\left(k1\right)+{b}_{z}\right) \left(14\right)$$
$$r\left(k\right)=\sigma \left({w}_{r}x\left(k\right)+{R}_{r}h\left(k1\right)+{b}_{r}\right) \left(15\right)$$
Here, \(R\) and \(W\) can be defined as learned weight matrices, \(\times\) can be defined as elementwise multiplication, \(\sigma\) can be defined as a logistic sigmoid function, \(h\) can be defined as candidate activation, \(g\) can be defined as activation function, \(z\) can be defined as update gate and \(r\) can be defined as reset gate. The GRURNN was created to predict thyroid illness. The GRU module unit that obtains the information in the thyroid data can be the most crucial design in the GRURNN network model. The COOT optimization algorithm aids in the selection of the weighting parameter in the GRURNN. The section below provides a thorough explanation of the COOT algorithm.
4.4.2. COOT Optimization Algorithm
In the GRURNN, the weight parameter is selected with the assistance of the COOT optimization. The random weight updating process of the GRURNN maybe reduce the accuracy level of the system. Hence, the COOT optimization is utilized to select the efficient weight parameter which enables a high accuracy level in the identification of thyroid disease problems. A detailed explanation of the COOT optimization is presented in this section [21].
The Coots can be a small water bird that is a member of the rail family and it is named Rallidae. The name Fulica, which refers to this kind of bird, comes from the Latin for coot. Coot can have various collective characteristics, the main objective is to simulate gathering movements. The complete cluster can be directed towards the final target by a few coots in front of the cluster which considers cluster leaders. Hence, four various coot moves of the water surface are considered which are presented as follows,

Random variation to both sides

Chain variations

Managing the position related to cluster leaders.

Improving the cluster through the leaders to the optimal location.
Mathematical model of the algorithm
The normal design of the complete optimization algorithm is similar to another metaheuristic algorithm. The positions of the coot are biased and the weight of GRURNN networks. The algorithm initiates with an initial random population which is presented as follows,
$$\left(\overrightarrow{X}\right)=\left\{\overrightarrow{{X}_{1}, }\overrightarrow{{X}_{2}, }\dots .,\overrightarrow{{X}_{N} }\right\} \left(16\right)$$
The random population can be continuously validated with the consideration of the final function in addition a final value is computed as follows,
$$\left(\overrightarrow{O}\right)=\left\{{O}_{1},{O}_{2},\dots ,{O}_{N}\right\} \left(17\right)$$
The set of guidelines that make up an optimization method's core can be used to improve it. There is no assurance that a solution will be computed in one iteration due to populationrelated optimization methods searching for the ideal number of optimization problems. The likelihood of computing the global optimal also rises with necessary numbers of random solution and optimization phases. By using the formulation below, the population can be generated at random in a beautiful area,
$$CootPOS\left(I\right)=RAND\left(1,D\right).*\left(UBLB\right)+LB \left(18\right)$$
Here, \(UB\) and \(LB\) can be defined as an upper and lower bound of search space, \(D\) can be defined as problem variables or number of variables, \(CootPOS\left(I\right)\) is defined as coot position. Every parameter may contain various upper and lowerbound problems,
$$UB=\left[{UB}_{1},{UB}_{2},\dots ,{UB}_{D}\right],LB=\left[{LB}_{1},{LB}_{2},\dots ,{LB}_{D}\right] \left(19\right)$$
The position of the search agent should be determined after constructing the starting population, and each solution's fitness should be calculated using,
$${O}_{I}=f\left(\overrightarrow{X}\right) \left(20\right)$$
This equation is an objective function. The cluster leader in this case should be the NL number of coots. Leaders are chosen at random. The method is updated based on the coots' four motions.
Fitness evaluation
For each coot, the fitness function parameters can be computed. The fitness function parameters are computed related to the difference among the detection and its related observations with the consideration of the formula,
$${MSE}_{J}=\frac{1}{N}\sum _{T=1}^{N}{\left({x}_{T}\underset{{x}_{T}}{⏞}\right)}^{2}, J=\text{1,2},..,pn \left(21\right)$$
Random changes to both sides
Consider a random place in the search space that is connected to the following formula to examine these changes, then move the coot in that direction,
$$q=RAND\left(1,D\right).*\left(UBLB\right)+LB \left(22\right)$$
This coot variant looks into different areas of the search space. These modifications will make the technique escape from the local optimal if they have an impact on the local optimal. Coots's novel position can be computed based on the below formulation,
$$CootPOS\left(I\right)=CootPOS\left(I\right)+a\times {r}_{2}\times \left(qCootPOS\left(I\right)\right) \left(23\right)$$
Here, \(a\) can be defined as a random variable, \({r}_{2}\) can be defined as a random number in the specified range [0,1].
$$a=1l\times \left(\frac{1}{Iteration}\right) \left(24\right)$$
Here, \(Iteration\) can be defined as maximum iteration and \(l\) can be defined as current iteration.
Chain variation
Chain variation is carried out using the dual coots' average location. Another method for verifying a chain modification is to first calculate the distance between two coots, and then transfer one of them to the other with about half of the distance [22]. Here, using the original method and the newly discovered location of the coot, the following formula can be used to compute,
$$CootPOS\left(I\right)=0.5\times (CootPOS\left(I1\right)+CootPOS\left(I\right) (25)$$
Here, the second coot is represented as \(CootPOS\left(I1\right)\).
Managing the position related to the cluster leaders
The cluster is typically led by a few coots who stand in front of it. The other coots must regulate their proximity to the cluster leaders and migrate toward them. The remaining coots have changed their positions in relation to the dominant one. Additionally, the coots improve their position in relation to the average position of the leaders, which is also taken into consideration. Considering the average positions creates the problem of convergence. To compute these variations, the below formula is presented as follows,
$$k=1+\left(I mod NL\right) \left(26\right)$$
Here, \(k\) can be defined as the leader index number, \(NL\) can be defined as the count of leaders and \(I\) can be defined as the index number of the present coot. The coot should upgrade its location related to leaders \(k\). Based on the selected leader, the coot's future location is calculated.
$$CootPOS\left(I\right)=LeaderPOS\left(K\right)+2\times {r}_{1}\times cos\left(2r\pi \right)\times (LeaderPOS\left(K\right)CootPOS\left(I\right) (27)$$
Here, \(r\) can be defined as the arbitrary number in the period [1,1], \(\pi\) can be defined as the pi parameter as 3.14, \({r}_{1}\) can be defined as a random number in the interval [0,1], \(LeaderPOS\left(K\right)\) can be defined as chosen leader position and \(CootPOS\left(I\right)\) can be defined as the present position of coot.
Algorithm 1: pseudo code of the proposed algorithm

Initialize the first population of coot’s weight and biases of GRURNN
Initialize the variables P = 0.5 and NL
Number of coots
Random selection of coots as leaders
Determine the leaders' and coots' fitness
Calculate the global optimum, or the best leader or coot.
Condition is satisfied
Compute a and b variables
If\(RAND<P\)
\(r,r1 and r3\) can be random vectors
Else
\(r,r1 and r3\) are random variables
End
For \(I=1\) to the coot numbers
compute the variable of k
\(IF RANDOM>0.5\)
Update the position
Else
\(IF RANDM<0.5\)
Update the position
Else
Update the position
End
End
Calculate the coot fitness
If the coot's fitness is greater than the leader's,
Coot = Temp, coot = leader(k), Leader(k) = Temp
End
End
For the number of leaders
IF RAND < 0.5
Update the position of the leader
Else
Update the position of the leader
End
If the fitness of\(gbest> leader\)
Save the best parameter
End
End
\(Iteration=Iteration +1\)
End

Improving the cluster through the leaders the optimal location
The cluster should be directed to an optimal location, so leaders are required to upgrade their location to the goal. It the recommended upgrading the location of leaders. The formulation looks for the best locations around this present optimal rate. Leaders can compute the best places by moving away from the current optimal location. The finest method for obtaining the ideal position and escaping from it is provided by this formula.
$$LeaderPOS\left(I\right)=\left\{\begin{array}{cc}b\times r3\times cos\left(2\pi r\right)\times \left(gbestLeaderPOS\left(I\right)\right)+gbest& r3<0.5\\ b\times r3\times cos\left(2\pi r\right)\times \left(gbestLeaderPOS\left(I\right)\right)gbest& r4\ge 0.5\end{array}\right\} \left(28\right)$$
Here, \(r\) can be defined as the interval [1,1], \(\pi\) can be defined as the pi parameter i.e., 3.14, \(r3,\) and \(r4\) is defined as a random variable in the interval [0,1] and \(gbest\) is defined as best position ever found.
$$b=2l\times \left(\frac{1}{Iteration}\right) \left(29\right)$$
Here, \(Iteration\) can be defined as maximum iteration and \(l\) can be defined as present iteration. With the assistance of the coot algorithm, the optimal weighting parameter of the proposed classifier is achieved. Finally, the proposed classifier is utilized for the identification of thyroid from the databases.