The findings of our experimental comparison are discussed in this section using two examples, one actual and another artificial. The tests we presented in this section have been conducted using the Intel Core i7 3.6 GHz Processor, on Windows 10 pro, 64 bits of operating system and 32 GB of RAM, and using value-iteration, iterative policy iteration, and running policy iteration algorithms.
7.1 Real Scenario: In this study, the first experiment comprises two classes of web services. One class concerns weather services that can be used in a region to get current temperatures. The other class includes web services, for instance, Celsius to Fahrenheit, which can be used to transfer heat from one metric unit to the next. We considered three separate web services in the weather services group. (a) GlobalWeather Web service, available at. https://www.ibm.com/docs/en/api-connect/saas?topic=api-example-wsdl-file-globalweatherwsdl (b) National Oceanic and Atmospheric Administration (NOAA) Web service, available at https://www.noaa.gov/weather (c) Weather channel Web service, available at https://www.weather.gov/documentation/services-web-api. We found four web services in the category of metric units conversion services.
(a) A basic Web service calculator like the one available
http://www.dneonline.com/calculator.asmx .
Since 𝐶 = 5 ∗ (𝐹 − 32)/9, (8)
For temperature conversion, we can employ multiplication, subtraction, and division.
(b) Convert Temperature Web service, available at
http://www.webservicex.net/ConvertTemperature.asmx .
(c) Temperature Conversions Web service, available at
http://webservices.daehosting.com/services/TemperatureConversions.wso .
(d) Temp Convert Web service, available at
http://www.w3schools.com/webservices/tempconvert.asmx .
All seven web services were given for the QoS attribute values by means of a Java program designed to obtain the following formulation attribute values:
Availability = 𝐶𝑆/𝐶𝑇, (9)
If CS is the number of active Web service calls and CT the total number of calls,
Execution time = 𝑇/𝐶𝑇, (10)
where 𝑇 is the total execution time for all the 𝐶𝑇 calls,
with 𝐶𝑇 = 50.
In order to obtain representative QoS value for web services, we have made several tests, several days in different periods of the day. The values were obtained and the average QS values were then calculated for each parameter and measurement. Once we have gained the data of the attributes of QoS, all three dynamic programming algorithms have been used to learn the best composite Web service. There are twelve possible compositions of seven Web services belonging to two separate groups. All these choices are shown in the graph shown in Fig. 1.
The real-life scenario graph shows the layer of any form of web services. Per node represents a single web service in this graph. Node S is the condition that has not been chosen yet for any of the Web services. Node G reflects the state of the complete composition of the web service. A route from S to G includes the generation of a valid web composite service.
Figure 2 explains the outcomes of the actual Web services scenario. All of the three algorithms found the solution for the composition of web services very easily and were the winner in less than 0.01 seconds.
7.2 Artificial Scenario.
We simulated information on three QoS attributes: execution time, availability, and throughput in this second assumption test all three dynamic programming algorithms. This study, built up to 1,000,000 web services, divided into 1000 groups of hypothetical Web services. We have presumed any i-class web service can access all the i + 1-class web services. The 1000 layers or individual web services are included in each layer. As in the first case, node S is the graph's initial state and reflects a situation in which no web services have yet been chosen. When a true composition is completed, node G is reached. The available Web services are defined by nodes between S and G. A S-to-G route offers a potential composite web service. Figures 3, 4 and 5 show findings for this second series of experiments for µ = 0.7, µ = 0.8 and µ = 0.9 respectively.
The graph shows every layer of 100 web services of the same class. So with a valid web service composition number of nodes to be chosen at 10000, we are solving a problem with 100 x 10,000 = 1000,000 web services. The learning curves show that the time taken to solve the RF problem increases with the increase in the number of nodes. Again, the best solutions were found in all three algorithms, but in less time policy iteration was found. In µ = 0.8 and µ = 0.9, need needs less than 130 seconds to obtain the optimum composition using the iterative policy assessment and value iterations and less than one hundred seconds when it was politically iterated, the best achievements of algorithms were achieved.
7.3 Comparison with Sarsa and Q-Learning
Reinforcement learning algorithms were suggested for a resolution of the compositional problem of web services in some relevant works [47–049]. We compare Q-learning and sarsa time with policy iteration, value iteration, and iterative policy evaluation. This Section compares the learning time needed.
7.3.1 Sarsa. Sarsa is an on-policy algorithm to track temporal difference that continually estimates the status value Qπ for the conduct policy − and at the same time moves π to greed with respect to Qπ, respectively. The sarsa algorithm as taken from [50] is seen in the Algorithm. If the strategy is such that in any state all acts are done in an endless way, each state is repeatedly visited infinitely, and it becomes greedy for the present action-value function, then the algorithm converges to the Q* [52] in decaying α.
7.3.2 Q-Learning. Q-learning [53] is a non-political time gap management algorithm that explicitly approximates, independently from the policy being implemented, the optimum action-value function. It's one of the most common reinforcement learning algorithms. A Q-learning algorithm is modified as from [50]. If the limit values are indefinitely always modified for all state-action pairs with the decay of α, then with probability 1 [51, 53] the algorithm converges to Q*.
7.3.3 Learning Time Analysis. In order to solve the scenario problem described in the experimental field, we have implemented Q-learning and Sarsa algorithms. We can clearly see from this graph that two orders of magnitude and more time were needed for sarsa and Q-learning to find the optimum composition. Figure 6 presents the high definition measure with a logarithmic scale. In addition, we performed experiments with a second artificially constructed scenario, each with 3 layers of 20 Web services. Again, reinforcement learning approaches took a lot more time than dynamic programming algorithms. In addition, the reinforcement learning method in some experiments failed, being in suboptimal compositions, to find the optimal solution. Dynamic methods of programming converge more rapidly than methods of reinforcement learning simply because dynamic methods of programming update each state value with every iteration. Only the value of the states that happen to visit is modified by reinforcement learning techniques, giving its exploration strategy, that is, epsilon greedy.
In addition, it should be noted that the processing of QoS information may be carried out at specific time intervals by a dedicated module of such a system with regard to the implementation of an automatic Web service composition system. If we have gathered this data, which is central to the evaluation of the reward function, as reinforcement learning methods do, there is no need to explore the state space of web services. We simply to operate a dynamically programmable algorithm to estimate the web service value function and to determine the optimum web services composition.