A genetic algorithm for subcontractors selection and allocation in multiple building projects

In construction industry, general contractors (GCs) need to manage and conduct numerous projects simultaneously. To this aim, they usually have subcontractors conduct available tasks and projects. So, subcontractor management is becoming a major challenge. To deal with this challenge, GCs aim to reduce total costs of projects including employment/subcontracting costs, indirect costs, tardiness penalties, and the money which must be paid for movement of workforce from one project to another. The aim of current research study is to propose a model for selecting subcontractors and assigning available tasks in the project to them in order to reduce the costs of the GC. Then, a genetic algorithm is proposed to solve a real problem. The proposed algorithm is innovative from three points of view: (1) generation of initial population; (2) subcontractor assignment approach; and (3) the objective function. The problem was also solved by means of an exact method. Then, the results of proposed algorithm were compared to the outcomes of the exact method. This comparison shows that the proposed algorithm can efficiently help GCs select subcontractors and assign available tasks to each subcontractor when several projects must be carried out simultaneously.


Introduction
One of the initial steps in project planning is to identify the right subcontractors and assign the available work packages of a project to them. In this research study, the word subcontractor means the contractors who are working as some groups at the lowest level in the hierarchy of contractors in a project. For example, the ones who are involved in concreting or reinforcing operations. Pells (1993) mentioned that ''the selection of subcontractors is a difficult and time-consuming process.'' General contractors (GCs) usually have limited information about the real cost and duration of a project because these parameters depend on the capabilities of the subcontractors who carry out the work packages of the project.
Many researchers have studied subcontractor selection in the literature to introduce quantitative approaches to help GCs in this regard (Monghasemi et al. 2015). Most of these approaches have been based on multi-criteria decisionmaking approaches (Afshar et al. 2020a, b;Afshar et al. 2017;Fachrurrazi and Munirwansyah 2017;Ulubeyli and Kazaz 2016;Polat 2016). Recent research studies show that a significant proportion of them did not consider the impact of subcontractor selection on overall features of the project including project duration, cost, and feasibility. While these features are closely linked to the performance of subcontractors. Hence, mathematical models were applied more to consider the effect of subcontractor selection on the overall features of a project (Afshar et al. 2020a, b;Sonmez and Gürel 2016;Polat et al. 2015;Beşikci et al. 2015;Mungle et al. 2013;Mokhtari and Abadi 2013;Pollack-Johnson and Liberatore 2006;Józefowska et al. 2001). However, most of these models have studied subcontractor selection when the GC had to conduct only one project.
Whereas, most of GCs must carry out several projects simultaneously (El-Abbasy et al. 2017). To make the problem more realistic, it is necessary to consider the condition in which the GCs need to conduct numerous projects at the same time. Taking these realistic situations in consideration makes the model and the formulation more complicated. To deal with these realistic problems, Afshar et al. (2020a, b) investigated the problem of subcontractor selection when the GC must conduct numerous projects at the same time. However, their model did not consider the appropriate assumption of the projects in the construction industry. Additionally, more actions need to be taken in order to consider some other aspects of the project including the cost of employed skilled workers and subcontractors and the time which employed skilled workers and subcontractors need to move from one project to another one. So, it is necessary to reach a more practical approach.
Another drawback is that Afshar et al. (2020a, b) used the CPLEX solver of the general algebraic modeling system (GAMS) for solving the problem. However, the application of precise techniques would fail to solve the problem in a reasonable computing time because the problem itself was a non-deterministic polynomial-time hard (NP-hard) type and real construction projects include multiple work packages and subcontractors (Afshar et al. 2020a, b). Thus, in practice, metaheuristic algorithms are required for large construction projects to generate nearoptimal solutions in reasonable computational time.
In order to fill the mentioned gaps, the current study tries to propose a real model for subcontractor selection and allocation in multiple building projects that fits the real situation. In this regard, first the required assumption will be discussed and then the model will be proposed. In this model, different locations, times, and effort which is required for the movement of employed skilled workers and subcontractors from one project to another are considered. Then, a GA is developed to solve the problem. Innovation of the proposed algorithm can be discussed in three ways: This algorithm proposes a heuristic method to generate a part of the initial population. An infeasible tackling procedure is also presented to convert infeasible solutions to feasible ones, and a simple fitness function is proposed to handle feasible and infeasible solutions. To validate the efficiency and feasibility of the proposed model and proposed GA, a real case study of the building industry is included.
The remainder of the paper is organized as follows. Section 2 presents the problem description. The proposed procedure for solving the subcontractor selection and allocation problem is stated in the third section. The computational experiments are performed in Sect. 4 in order to prove the effectiveness of the proposed model and metaheuristic algorithm. In Sect. 5, managerial implications are discussed. In the end, some conclusions are presented in Sect. 6.

Problem statement
The list of notations that are used in this research study is presented in this part.
Sets and indices i,h Set of work packages j,g Set of projects k Set of SCs

K j
The early completion period of project j (integer variable) G j Tardy period of project j (integer variable) In order to propose a model for subcontractor selection when numerous projects must be done simultaneously, the first 12 assumptions of the manuscript will be discussed and then the model will be produced.
Assumption 1 In order to prevent any possible controversy or interference between subcontractors or the groups of skilled workers, each work package is assigned to one subcontractor/one group of skilled workers. This assignment will be done through employing subcontractor k or full-time employment of the group of skilled workers which work for subcontractor k (From this point, as far as the end of the manuscript group of skilled workers k means a group of skilled workers who works for subcontractor k and which is employed on a full-time basis by GC to carry out the related work packages).
Equation 1 which is mentioned below guarantees this issue.
Assumption 2 In order to restrict the influence of subcontractors on overall process of the project and to restrict their capability to stop the project, the number of work packages that can be assigned to each subcontractor will be limited. The level of limitations will be determined by the GC. Equation 2 ensures this policy.
Assumption 3 When the group of skilled workers k is employed to carry out the projects, the GC has more capability to control them. So, no limitation will be set on the number of the work packages which the group of skilled workers k can conduct. Equation 3 grantees this situation.
Assumption 4 When more than one work package is assigned to one subcontractor or one group of skilled workers, they might not be able to carry out the work packages simultaneously due to their limited capability. Equations 4 and 5 confirm this approach. P ijhgk þ P hgijk 1 8i; j; h; g; k Assumption 5 The movement of one subcontractor or one group of skilled workers from one work package to another in one project has been considered in assumption 4. However, the time and money which must be spent on these items have not been taken into account. This fact happens when the movement occurs among the work packages of different projects. Accordingly, the time which is required for subcontractors or the group of workers to move from one project to another one must be considered. Equations 6 and 7 show these issues. The cost of movement is also expressed in the objective function.
Assumption 6 Since most of the precedence relations among the activities in conventional building projects are based on the finish to start criteria with lag (Sonmez and Gürel 2016), in this project the precedence relations are considered based on the above-mentioned criteria which is expressed in Equation 8.
Assumption 7 Starting time of each project must be after the official date of that project which is mentioned in the contract. Equation 9 confirms this point.
Assumption 8 When the group of workers k is employed by the GC, the duration of their collaboration with the GC is calculated based on the deadline of the projects that they took part in. So, the duration of collaboration is calculated based on Eqs. 10. Equation 11 calculates the time that the group of skilled workers collaborates with the GC.
Assumption 9 If more than one work package is assigned to one subcontractor, the subcontractor would offer a discount because its unemployment costs will reduce under this circumstance. In this research project, the discount is defined by different levels of discount. Each discount level includes a discount percentage which is applied to the subcontractor's bid price. By increasing the number of work packages that are assigned to a subcontractor through subcontracting, the discount percentage will increase. Equations 12 and 13 activate one of the discount levels for subcontractor k. Here, Eq. 14 calculates the percentage of subcontractor k bid price which must be paid to this subcontractor (N k ).
Assumption 10 In order to improve the reputation of the GC in front of the main employer, all the projects must be finished within the specified deadline. Equation 15 ensures this point and Eq. 16 calculated the duration of the project (j).
Assumption 11 If the GC finishes the project sooner than the specified deadline, it will be rewarded. This point is calculated by Eq. 17.
Assumption 12 Indirect costs are directly linked to the duration of each project and are determined based on the size of the project in objective function (Eq. 18). Determination of these costs is not the subject of study in the current research project. In this research project, it is assumed that these costs are determined based on the experience of the GC.
Under the mentioned constraints, the GC as an economic organization seeks a way to minimize the total cost. Thus, the objective function (Eq. 18) is equal to the total cost of the projects including the following terms: a) The cost of employing group of skilled workers (k) e) The money which GC should pay to subcontractors or the groups of employed skilled workers for their movement from one project to another one It is clear that the proposed model is nonlinear. Solving such a problem will be time-consuming even in small-scale problems. So, this nonlinear model will change to a linear model in the next chapters.

Linearization of the proposed model
When some decision variables are multiplied to each other (x ijk Â SW ijk ; y ijk Â SW 0 ijk ; x ijk Â N k ), nonlinearity will be produced in the model. In order to avoid nonlinearity, auxiliary variables of SSW ijk /SSW 0 ijk /nx ijk are replaced by the mentioned multiply operations Plus, the relationship between auxiliary variables and the main variables (Eqs. 19 to 27) are defined in a way that the result becomes equal to the result of the multiply operations which were omitted.
3 Genetic algorithm (GA) for solving the subcontractor selection and assignment problem The problem of subcontractor selection and assignment of the available work packages to them can be considered a part of an NP-hard-type problem (Afshar et al. 2020a, b). Accordingly, exact methods cannot solve these problems in a reasonable amount of time (Rehman et al. 2020;Khan et al. 2020). Therefore, GA, as a metaheuristic algorithm, is introduced to be used for solving this problem. GA is generally utilized to generate high-quality solutions for optimization problems by mutation, crossover, and selection operators (Agrawal et al. 2021).

Solution representation
Before a metaheuristic algorithm started to search, a suitable representation scheme must be determined. For this aim, in the current study, the random key representation (RKR) is employed to determine the work packages' priorities for scheduling. Also, the subcontractor list representation is used for assigning the subcontractors to the work packages. The subcontractor list assigns the work package i to subcontractor k and it identifies whether the schedule is feasible (considering Eq. 2) or not. For the representation of the employment way pertaining to all subcontractors, the binary representation is applied (when a subcontractor is not selected, its employment method is not considered).
Hence, three independent chromosomes (a, b c) are considered. a and b constitute the first part of each solution, while c represents the second part of it. The number of genes in chromosomes a and b were set to be the same as the total number of the work packages that the GC must plan to do. Additionally, the number of genes in chromosome c was set to be the same as the total number of subcontractors who made their bid to execute available work packages.

Primary population
There are two methods for population updating: (1) random method and (2) using a heuristic method. Each of the methods has advantages and disadvantages. The random method increases exploration, while decreases exploitation. On the contrary, using a heuristic method increases exploitation while decreases exploration. In the current study, in order to use advantages of the both methods, 50% of the initial population was generated by using a heuristic  On the other hand, using random generation approach for producing all initial populations may violate the deadline limitation for all of them. To avoid this violation, 50% of the initial population was generated by using a heuristic method. The proposed heuristic (Fig. 1) that is applied on chromosome b tries to distribute the work packages among all subcontractors in order to prevent the delay in conducting work packages.
In other words, if more than one work package is assigned to one subcontractor, it is disabled to perform them simultaneously due to its limited capacity. Therefore, one of the work packages will be delayed and the probability of exceeding its project duration will increase. The proposed method can prevent this issue.

Infeasible tackling procedure
Some solutions may violate Eq. 2. These solutions will change to feasible solutions by use of infeasible tackling procedure. Accordingly, one work package which had been assigned to a subcontractor who violated restriction number 2 is considered and it will be assigned to another subcontractor.
This process will be continued until the violation of Eq. 2 is removed. Afterward, the obtained feasible solutions are allowed to be entered into the fitness function assessment.

Fitness function
To evaluate the population, the solutions should be decoded with the help of schedule generation schemes (SGS). SGS is capable of converting a solution into a schedule. A fitness value is accordingly calculated for each solution. Some solutions may violate Eq. 15.
In order to tackle the infeasible solutions, two approaches can be adopted. The first one is to omit the infeasible solutions and the other one is to penalize the available infeasible ones (Lee and El-Sharkawi 2008). Since there might be a solution among the infeasible solution with a low level of impossibility, it is not reasonable to omit them all. So, in this research study the second approach will be adopted, and to this aim fitness function (Eq. 28) is considered.
When it comes to the feasibility of the solutions (G j 0), the fitness function will be equal to the project's total cost. In the case of the infeasibility of the solution, the fitness function is equal to the total project cost in addition to the fixed and variable penalty.
In fact, two extra costs have been considered for the infeasible cost. The role of the fixed penalty is to reduce the chance of infeasible solutions to produce offspring and control the population. Besides, the variable penalty is designed to separate good and bad infeasible solutions.

Population updating mechanism
After trial and error between the possible crossover operations, two points crossover has been selected. To this aim, the chances 2 3 and 1 3 are assigned for parts 1 and 2, respectively (Fig. 2).
Then, mutation shall be applied randomly only on one part of each solution (Fig. 3).
In order to produce evolved population, three populations of the previous generation (the population which was produced by crossover operator and the population which was produced by mutation operator) have been mixed up.
Then the most appropriate individuals with POP_size have been moved to the next generation.

Stopping criterion
In scheduling problems, generally, the termination condition is the limit of the number of generated schedules. The number of generated schedules is calculated by Eq. (30).
In the current research, the number of generated schedules is chosen at 5000. Thus, based on Eq. (30), the iteration number of GA is determined.

Computational experiments
The performance of the proposed mixed-integer linear programming (MILP) and the GA will be evaluated in this part. To this end, the model and the proposed algorithm will be applied to a real case problem, and the results will be compared to the outcomes of the other methods. MATLABÒ software v2016a was used for programming the proposed algorithms. Also, a laptop with an IntelÒ Core i7 6500U 2.5 GHz processor was used for testing the proposed algorithm.

Case study
The case study includes three building projects which are assigned to a GC. Tables 1, 2, 3, 4 and 5 summarize the information of these projects. It must be mentioned that d 0 ijk and d ijk were considered the same in the case study problem.

The performance evaluation of the proposed model
The first evaluation performed in the current study is devoted to the proposed model. Figure 4 demonstrates the schedule results of the case study using the proposed model. In order to solve the model, the CPLEX solver of GAMS was employed.
In order to validate the obtained results, the model of Biruk et al. (2017) (which was proposed when only one project must be done) was also applied to the real case problem and its results were compared to the results of the proposed model. The results are summarized in Table 6. As is observed, the proposed model reduced the expenditure of the GC by around 14 percent.
Therefore, it can be concluded that the application of the model of Biruk et al. (2017) cannot result in an optimum solution in multiple projects. These outcomes were predictable because Biruk et al. did not consider the required assumptions when numerous projects need to be done by the GC simultaneously. As a result, the proposed model suits better when numerous projects need to be done at the same time.
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21  Week 24   In order to evaluate the proposed GA for solving the subcontractor selection and allocation problem, first, the parameters of the proposed GA must be adjusted. For this aim, the Taguchi experiment design method (Montgomery 2005) is used. In this regard, three levels for GA parameters (4 factors) are considered (see Table 7) that should be tested according to values which are illustrated in Table 8. The response variable (RV) value (Eq. 31) is employed to compare the results of the experiments.
where Total cost GA represents the case study total cost which is obtained through GA, and OPT C is the optimal total cost of the case study which is calculated through the use of CPLEX solver of GAMS. The average results for each factor level have been reported in Fig. 5. As is observed, the optimum level of the factors A, B, C, and D are A(1), B(3), C(1), and D(2).

Performance of the proposed heuristic to produce initial population
In order to test the performance of the proposed heuristic for the production of the initial population, we compare its results with the random generation method. For this purpose, each of the procedures was applied 10 times on the case study. The experiments show that the results are improved by 10 percent using the proposed heuristic, while it increases CPU time 3 percent.

Performance of the proposed fitness function
In order to compare the proposed fitness function with a fitness function that works based on removing infeasible solutions, each of them was applied 10 times in the case study. The results in Table 9 show that the proposed fitness function improves the desirability of the solutions by 19%, while it increases CPU time slightly.

Performance of the proposed GA
Average deviation and run time as two performance measurement indices are utilized to evaluate the performance of proposed GA for solving the subcontractor selection and allocation in multiple building projects. Similar to previous performance evaluation, in order to test the performance of the proposed GA, it was run 10 times to solve the case study problem. The results are presented in Table 10.
Optimum values for the case study which are computed by the CPLEX solver of GAMS are considered as the basis for the calculation of the average deviations. It can be observed that the proposed algorithm can reduce computing time by 8% in comparison with the exact method and at the same time it is capable of reaching desirable solutions. In fact, although the exact method can probe the search space completely, the searching time will   Table 6 Optimal cost using the model of Biruk et al. (2017)

Managerial implications
The current research study presents a practical tool for GC to the management of multiple projects in four ways: • Cost optimization GCs as economical enterprises are always looking to reduce costs. The current study provides a practical tool for this purpose by considering the real conditions of building projects. • Time Management In order to maintain the credibility of the main contractor with the employer, the main contractor wants its projects to be completed within the deadlines specified in the contract. This paper presents a software for scheduling multiple projects such that the mentioned goal is achieved. • Portfolio management GCs generally conduct some projects simultaneously. They classify their projects into several categories such that the projects with the same resources are located in the same categories.
Because the organization's resources are limited, these projects should be prioritized for resource allocation. This is performed based on GCs 's goal that is usually cost minimization. This study is provided a practical for this purpose. • Subcontractor selection This paper creates a software for managing subcontractors. In this regard, it identifies the right subcontractors and assigns the available work packages of a project to them.

Conclusion
This research study proposes a real model for subcontractor selection and allocation in multiple building projects. In this regard, first, the required assumptions are discussed and then a mixed-integer linear programming (MILP) model is suggested. In this model, different locations, times, and effort which is required for the movement of employed skilled workers and subcontractors from one project to another are considered. Then, the model will be solved by means of an exact method and a metaheuristic method. The contributions of the proposed algorithm are threefold: (1) a heuristic method is presented for the generation of the initial population; (2) an infeasible tackling procedure is proposed in order to change infeasible solutions to feasible ones; and (3) a simple fitness function is suggested to handle feasible and infeasible solutions. The results show that: 1. When numerous projects need to be done at the same time, single project planning criteria will not result in an optimum solution. 2. The proposed MILP model also has better performance in multiple project environments compared to the existing models. 3. The proposed heuristic method has better performance (approximately 10%) in reaching better solutions. 4. The proposed fitness function could improve the desirability of the produced feasible solutions by 19% (at, respectively, the same as the approach of removing infeasible solution). 5. The proposed GA in stopping criteria of 5000 results in lower computing time (92% in comparison with the exact method). Besides, it can reach the desirable solutions.
Author contributions MRA, VSV, and MHS contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.
Funding The authors have no relevant financial disclosures.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.