Nature inspired algorithms are the most popular algorithms to solve the optimization problem. Meta-heuristic algorithms are designed to generate good solutions to an optimization problem by making a few assumptions. Firefly algorithm is one such meta-heuristic algorithm developed by Yang [13] for solving optimization problems. The algorithm was configured such that it inspires the behaviour of a firefly. The three behaviours of fireflies include the following:
- Every firefly creates a centre of attention on another firefly using its brightness.
- Fireflies having superior brightness have elevated attractiveness to the other fireflies.
- Fireflies having inferior brightness move to other fireflies with superior brightness.
The second behaviour is analogous to the fact that newly produced solutions are based on previous solutions having an enhanced fitness function. Theoretically, for each query, there may be either one or more new possible query plans.. This depends on the query cost opinion between solution and other solutions among the existing population of query plans.
In the standard firefly algorithm the best query plan generated is the global optimal solution. If there is a movement of this firefly as in standard firefly algorithm, the brightness of the firefly decreases which affects the performance of generating the query plan with less execution time. In conventional firefly algorithm, the distance is calculated by considering the current solution and another better solution. Instead of doing this, MFA uses the best solution value in the previous iteration to calculate the radius. This modification is done to make the firefly move towards the global optimal solution.
Suppose that for every solution i, Xi is a location of i th firefly at the present iteration. When the objective function value of solution i is superior than that of solution j, the distance between the two fireflies i and j is obtained using the following equation:
rij = sqrt(Xi – Xj)2 (1)
Then, the updated distance is used into Equation (2) to work out a new attractiveness. The new position for the ith firefly can be calculated matching to the generation of a new solution of the ith solution. The formula of generating a new solution is as follows:
β = β0e−γrij2 (2)
Xijnew = Xi + β rand ΔXij + rand (3)
Where r and is any random number of solution i; β0 is the attractiveness at zero distance and normally set to 1; γ is set to 0 which indicates high visibility of query plans in this case. Xj is a solution having lower fitness function than Xi; and ΔXij is a revised step size, which is calculated by the following equation:
ΔXij = Xj − Xi (4)
The flowchart for the proposed work is given below. In Figure 1 Flowchart of proposed MFA and Figure 2 Algorithm for proposed MFA.The steps for the proposed MFA are as follows.
3.1 RDF Query Plans
Semantic Web facts stored in the RDF version can be retrieved using the popular SPARQL language. In the context of query optimization, every SPARQL query must be imagined by means of a query tree. The leaf nodes in the query tree stand for the triples. The internal nodes are meant to join the triples. Query trees can be represented in different formats that include bushy trees, left-deep trees, and right-deep trees. The nodes in a single query tree can be restructured in many dissimilar ways to turn out the same results. The sequence in which methodologies are executed to regain the requested data is known as query plan. In this line of investigation, the trees used to represent the query paths are left-deep trees [14]. Left-deep trees are chosen because these trees can process the joins by applying cost-reducing pipelining methods.
3.2 Solution Space
Generally, solution space consists of a set of all processing trees that can produce a result for a given query. In the context of query optimization problem, the solution space contains query execution plans as solutions. There are n! feasible means to assign n triples to the leaf of the tree. The leaves of the tree consist of triples and the inner nodes are used to join these triples. The n! solutions are obtained by applying the conversion set of laws.
3.3 Encoding Methodology
For every optimization algorithm, we need to choose an encoding methodology, a suitable encoding for the solutions in the solution space. In this research, the solutions are query plans and they are symbolized using left-deep trees. To encode left-deep trees, we choose ordered list. Solutions are represented using an ordered list [14] of leaves. After generating all possible query plans for the given SPARQL query, the trees are encoded using the ordered list format. For example, consider the query tree with nodes and joins in the following order: ((((R1⋈ R2) ⋈R3) ⋈ R4) ⋈R5) is encoded as “12345”.
3.4 Objective Function
To determine RDF query path, we must first come to a decision on the fitness function. In this research work, the fitness function refers to the cost model of the left-deep tree. The following formula is used to define the cost model of the query tree:
Cost of single query tree = Sum of the costs of each operator involved in the tree
The cost of a particular operator depends on the cardinality and selectivity estimation.
The intention of this research is to reduce the query execution time to retrieve data from a large volume of Semantic Web data. After deciding the aforementioned parameters such as encoding methodology, cost model, and solution space, they are given as input to MFA. The algorithm takes different query paths as input, converts the query plans to some encoding format, and applies cost function to determine the cost of the query plans. After executing MFA with all the inputs defined, the algorithm returns the best query plan that reduces the execution time.