Mathematical Modeling and Optimization Of Flexible Job Shop Problem with Sequence-Dependent Setup and Transportation Energy Consumptions

As environmental awareness grows, energy-aware scheduling is attracting increasing attention. This paper investigates the flexible job shop scheduling problem with sequence-dependent setup times and transportation times (FJSP-SDST-T) and the objective is to minimize total energy consumption. To begin with, the total energy consumption of the workshop is analyzed and a novel mixed integer linear programming (MILP) model is formulated. Due to that FJSP-SDST-T is NP-hard, an effective hybrid algorithm (HGA) that hybridizes the genetic algorithm (GA) and variable neighborhood search (VNS) algorithm is proposed to solve the problem specifically for that with large size. HGA takes advantage of the good global searching ability of GA and the powerful local searching ability of VNS, and it can have a good balance of intensification and diversification. Then, four energy-conscious decoding methods are designed, in which two energy-saving strategies namely postponing strategy and Turn Off/On strategy are specially designed according to the characteristics of FJSP-SDST-T. Finally, experiments are carried out and the results show the effectiveness of the MILP model, the energy-conscious decoding methods and HGA.


I. INTRODUCTION
Flexible job shop scheduling problem (FJSP), as an extension of the classical job shop scheduling problem (JSP), is of great significance in the modern manufacturing system. For FJSP, an operation is allowed to be processed by different machines, and two sub-problems namely the machine selection of all operations and the operations sequencing on all machines should be decided. In addition, FJSP has proven to be a type of NP-hard problems [1] .
In previous studies, for solving FJSP, most of the research aims at minimizing makespan [2] . However, in recent years, with the dual pressure of environmental issues and energy costs, more and more attention are being paid to energyconscious scheduling, which has been proved to be effective in reducing the energy consumption with no or little capital investment [3][4][5][6] . In actual production, some machine tools stay in idle state for a long time, and they can be turned off and then back on for saving energy. This energy-saving strategy is called Turn Off/On strategy [7] , and it has been widely implemented in different scheduling problems [3,4,[8][9][10][11][12][13][14] .
In the classical FJSP, setup and transportation times are overlooked, which does not conform to the actual production situation. In real cases, jobs cannot be processed on the next machine immediately after its completion on the previous machine; instead, they should be transported between the machines by transportation systems [15] . Moreover, when an operation is completed on a machine, the machine has to be equipped with appropriate tools for the next operation, which incurs sequence-dependent setup times. The FJSP with sequence-dependent setup and transportation times (FJSP-SDST-T) is more in line with actual situation and should be studied.
Our work focuses on minimizing the total energy consumption of FJSP-SDST-T. Moreover, Turn Off/On is implemented to save energy. The aim of our work is to solve FJSP-SDST-T with both an exact method named MILP model and a hybrid mete-heuristic algorithm named HGA.
Comparing with previous studies, the contributions of our work can be summarized as three aspects, which are described as follows: (1) To the best of our knowledge, this paper is the first attempt to study energy-conscious FJSP-SDST-T with considering Turn Off/On strategy.
(2) A novel MILP model is firstly formulated for FJSP-SDST-T with minimizing total energy consumption.
(3) An effective hybrid algorithm HGA that integrates GA with VNS is proposed. Moreover, in HGA, four energyconscious decoding methods are designed specifically for FJSP-SDST-T.
The rest of the paper is organized as follows. Section 2 introduces the related works of FJSP-SDST-T. Section 3 formulates the energy-efficient MILP model. Section 4 elaborates the hybrid algorithm HGA. Section 5 gives the experimental evaluations. Conclusions and future works are presented in Section 6.

II. LITERATURE REVIEW
With regard to the scheduling problem, it can be solved by two classes of methods, among which the first one is the exact method and the second one is approximation method. For exact methods, they mainly contain branch-and-bound algorithm [16] and mixed integer programming (MIP) [1,2,4,[17][18][19][20] among others. As to approximation methods, they include heuristic rules and meta-heuristic algorithms. The MIP model can solve the small-sized problems to optimality and can elaborate all the characteristics of a scheduling problem. Besides, it is very important for designing new dispatching rules. Recent years, MIP solvers such as Cplex and Gurobi improve a lot, and MIP modeling of scheduling problem attracts more and more attention [9,19,[21][22][23][24] . Based on differen modeling ideas, Mehrabad and Fattahi [25] and Shen et al. [22] both proposed a MILP model for FJSP-SDST with the objective of minimizing makespan. Karimi and Ardalan [15] designed two MILP models (a sequence-based model and a position-based model) for FJSP with transportation times. Aimed at minimizing total tardiness, Mousakhani [26] developed a MILP model for FJSP-SDST. With considering different objectives, constraints and modeling ideas, the decision variables and constraint sets of the MILP model vary greatly. Compared with the existing research [15,22,[25][26] , our paper considers the energy consumption objective with turn Off/On energysaving strategy, and it more complex and novel. Zhang et al. [4] firstly developed a MILP model for FJSP with Turn Off/On strategy, and then Meng et al. [17] proposed five more efficient MILP models based on different modeling ideas. Moreover, Meng et al. [19] designed a MILP model for FJSP with considering worker flexibility and Turn Off/On strategy. However, MILP model consumes more time and computer memory with the increase of the problem size [27][28][29] . Therefore, it shall not apply to solve large-scale problems [18] . The approximation methods, especially the meta-heuristic algorithms such as genetic algorithm (GA) [3,6,[30][31] , tabu search(TS) algorithm [22] , grey wolf optimization algorithm [32] and virus optimization algorithm(VOA) [33] have proven to be effective for solving the scheduling problems, particularly for large-size problems. To optimize makespan of FJSP-SDST, Shen et al. [22] proposed a tabu search algorithm, and Zhang et al. [34] proposed an improved genetic algorithm. To minimize makespan and the total setup costs of FJSP-SDST, Li et al. [35] designed an elitist non-dominated sorting hybrid algorithm. To minimize energy consumption of FJSP, Zhang et al. designed a Gene Expression Programming (GEP) algorithm to mine dispatching rules. Meng et al. [19] designed a VNS for FJSP with considering worker flexibility and Turn Off/On strategy. For FJSP with controllable processing times (FJSP-CPT), Gong et al. [36] proposed a hybrid GA to simultaneously minimize makespan, worker cost and green objective, Luo et al. [32] designed a multi-objective grey wolf optimization algorithm for simultaneously minimizing makespan and total energy consumption, and Wu and Sun [3] designed a NSGA-II to optimize makespan, energy consumption and the number of Turn Off/On strategy simultaneously.
Reading the relevant literature one can conclude that more and more researchers are paying more and more attention to energy-efficient scheduling both based on MILP formulations [2,4,8,17,19,26] and meta-heuristic algorithms [3,4,6,8] . FJSP-SDST-T is more close to actual production workshop; therefore, this research is significant both in theory and application.

A.PARAMETERS
Parameters that are used in this paper are presented as below:    i are processed successively on machine k .When an operation of a job is finished, the job needs to be moved to another machine by a transporting vehicle. Therefore, transportation times should be took into consideration. With the objective of minimizing energy consumption, Turn Off/On and postponing strategies (it is detailedly described in following Sections IV.C.4), IV.C.5) and IV.C.6)) are taken into consideration to reduce idle energy consumption.
The problem is to assign each operation to an appropriate machine (machine selection subproblem), to sequence the operations on the machines (operations sequencing subproblem) and to decide whether to apply the Turn Off/On strategy or not when the machine is in idle state (Turn Off/On strategy decision subproblem). Moreover, we take the following assumptions into consideration:  The number of the transporters is considered to be infinite.


No more than operations can be simultaneously machined on a machine.


The different operations of one job cannot be processed simultaneously and must following the given processing route.


The magnitude of the transportation times depends on the distance among the machines.

C. TOTAL ENERGY CONSUMPTION OF THE WORKSHOP
In this paper, we divide the energy consumed by the workshop into five parts namely processing energy consumption (PE), setup energy consumption (SE), idle energy consumption (IE), transportation energy consumption (TE) and common energy consumption (CE) [37] , among which PE, SE, and IE are the energy consumed by the machine tools when they are in processing, setup and idle modes respectively. TE is the energy consumed by the transportation systems and CE represents for the energy consumed by auxiliary equipments in the workshop. The total energy consumption (TotalE) is the sum of PE, SE, IE, TE and CE.

1) PROCESSING ENERGY CONSUMPTION
As for processing energy consumption, it is the energy consumption of machines when they are at processing state and can be calculated as, , ,

2) SETUP ENERGY CONSUMPTION
Setup energy consumption is the energy consumed by the machine tool for adjustments such as changing tool and replacing fixture [38] , and it can be computed as, ' ' , ,

3) IDLE ENERGY CONSUMPTION
As to idle energy consumption, it is the energy consumed by the machines in idle mode. In actual production, a machine often waits for jobs due to their late arrivals. This energy consumption is useless and should be reduced as far as possible. The proposed two energy-saving strategies are for reducing it. IE can be computed as, ' ' , , When a machine is in setup state, it must be kept on so as to do the adjustments. While if a machine stays in idle state for a long time, it economically justifiable to turn it off and then turn it on. Then, the energy consumed in the idle period will be reduced to the energy consumption for Turn Off/On of the machine. A power curve that includes the idle, setup, processing and Turn Off/On states is displayed in Fig. 1 For the sake of determining the Turn Off/On strategy decision subproblem, we introduce a binary variable , kt Z . If the Turn Off/On strategy is implemented, , 1 kt Z = ; otherwise, , 0 kt Z = . IE with considering Turn off/On strategy can be processed as, ' , ,,

4) TRANSPORTATION ENERGY CONSUMPTION
With regard to the transportation energy consumption, it is the energy consumed by the transporters for transporting the jobs between different machines. Moreover, it can be defined as,

5) COMMON ENERGY CONSUMPTION
In actual production, many auxiliary and supporting equipments such as lighting, air conditioning, ventilation and heating are needed to keep the workshop environment, the energy consumed by which is the common energy consumption. CE is calculated as below, 0 max To summarize, the total energy consumption of the workshop with considering Turn Off/On strategy is calculated by equation (11),

D. MILP MODEL
As can be seen from objective function (11), there are four non-linear terms, namely , ,1 ijk ij k XX + . Thus, the model is hardly non-linear and nonconvex. Non-linear models are much more difficult to solve than linear ones. Owing to the existing of many local optical solutions in the feasible solution space of the non-convex models, it is NP-hard to solve their optimal solutions. Therefore, we linearize the nonlinear model by introducing two intermediate variables namely , , , In this function, the first term to the last term represent for the total processing energy consumption, the total idle energy consumption, the total transportation energy consumption and the common energy consumption respectively.

3) CONSTRAINT SETS
Constraint set (15) restricts that each operation , ij O is performed once and is exactly assigned to one position of an optional machine. ,,, 1, , Constraint set (16) states that each position of each machine can execute no more than one operation at the same time. ' ,,, ,,, 1 , , Constraint set (17) reveals that the front of the position takes priority to process operation. That is to say, the back position cannot be assigned to an operation if any of the front positions is free. , Constraints (18) enforces that each operation can only be started when its preceding operation has been finished and transported to current machine. , , , , ,, , , , , 20) Constraint sets (19) and (20) .   ,  ', ', , 1 , ',  , 1  ,  , , , , ,  , , ,   ' , ,, , ∑∑ ∑∑ ∑∑ (24) Constraint sets (21), (23) and (24) concurrently demonstrate the Turn Off/On strategy. Constraint set (21) restricts that for two adjacent operations which have been assigned to machine k , the succeeding operation can start only if the precedent operation has been completely finished and the setup operations have been finished. Specifically, if , =1 kt Z , constraint set (21) ensures that the idle time between position t and 1 t + of machine k is no less than the breakeven time. If a position of a machine is not assigned to any operation, constraint set (22) guarantees its starting time to be no less than that of its preceding position; otherwise, constraint set (22) is valid inequality and constraint set (21) holds. Constraint sets (23) The Turn Off/On method may significantly reduce energy consumption. However, frequently using the Turn Off/On method could shorten the service life of a machine tool, and constraint set (25) aims to limit its maximum allowable times. Transportation energy consumption constraint set: , , , Constraint set (27) defines that the makespan is no less than the completion times of the last operations of all jobs.

IV. THE PROPOSED HYBRID ALGORITHM FOR FJSP-SDST-T
Meta-heuristic algorithms can be divided into two categories. The first category denotes the swarm intelligent algorithm, which often comes from nature, especially biological systems. For example, GA is inspired by process of natural selection, SFLA is proposed by imitating the behavior of a group of frogs for foraging, ABC is designed by referring to the intelligent foraging behavior of honey bee swarm, particle swarm optimization(PSO) is inspired by the social behavior of bird flocks. The common features of these swarm intelligent algorithms are that they solve a problem by population of candidate solutions. Therefore, they commonly have good ability of global searching and weaker ability of local searching. Being totally different with the first category, the second category, denoted as local search algorithm, only uses solution to solve a problem. These kinds of algorithms start only with one solution and generate new solutions by searching specific neighborhood structures. The most commonly used local search algorithms are VNS, simulate anneal(SA)algorithm and tabu search (TS) algorithm. Therefore, being contrary to swarm intelligent algorithms, local search algorithms poss good ability of local searching and weaker ability of global searching. Therefore, many researchers propose hybrid algorithms that combine swarm intelligent algorithm and local search algorithm for better solving a problem. For example, Li et al. [39] proposed a hybrid algorithm that combines GA and TS for solving classical FJSP with minimizing makespan, Dai et al. [12] combined GA and SA to solve HFSP with minimizing energy consumption, Li et al. [40] combined GA and VNS to solve integrated process planning and scheduling (IPPS) with minimizing makespan and Gao et al.
[54] combined GA and VNS for solving classical FJSP with minimizing makespan. In consideration of the good performances of GA and VNS for solving FJSP [3,19,39,[41][42][43], therefore, in this paper, we propose the hybrid algorithm HGA that combines GA and VNS to solve energy conscious FJSP-SDST-T. We insert the VNS into GA to improve its local searching ability. The proposed HGA can make full use of both the advantages of GA and VNS, which can balance the intensification and diversification very well. Fig. 2 shows the flowchart of the HGA. For HGA, eight components are very important namely encoding, decoding, selection, crossover, mutation, VNS, initiation and termination criteria, which will be described in the subsequent sections.

A. WORKFLOW OF THE PROPOSED HGA
The overall procedure of HGA is given as below: Step 1: Set the parameters of HGA such as population size ( s p ), crossover probability ( c p ), mutation probability ( m p ), the maximum number of iterations ( maxIter ), the maximum CPU time ( maxTime ) and the maximum number of evaluations ( maxEval ).
Step 2: Initiate the population based on the encoding method and set Gen=1.
Step 3: Evaluate all the fitness values of individuals by using the decoding method.
Step 4: Judge whether the termination criteria is satisfied or not. If yes, go to Step8; else, go to Step5.
Step 5: Generate the new population by using selection, crossover and mutation operators.
Step 6: Apply the VNS to top 0.1 s p × individuals with good fitness.
Step 8: Output the best solution.

B. ENCODING SCHEME
Encoding explains how to represent a real solution. Encoding of the individual is very important in GA. In this paper, we use the encoding method used in paper [34] . Chromosomes represent for the solutions of FJSP-SDST-T. The chromosome includes two strings namely operation sequence (OS) string and machine selection (MS) string. The OS string defines all operations of a job with the same symbol and then interprets them according to the sequence of their appearance, the length of which is equal to the total number of operations. The genes of MS string describe the selected machines of the corresponding operations, whose length is also equal to the total number of operations. It is important to note that each element of MS does not represent for the actual machine number but the index in the matrix of alternative machine set [39,43] . Fig. 3 shows an example to illustrate the encoding method. With regard to MS string, for example, the machine index of operation 1,4 O is 3, and it corresponds to the real Machine 6.

C. DECODING SCHEME
Decoding is for transforming a chromosome to a real schedule. As to the same chromosome, with different decoding methods used, different schedules will be obtained.
Below are four types of decoding methods, among which the first one is semi-active decoding (SAD), the second one is active decoding (AD), and the third one is energy-conscious greedy decoding (GD). The last type of methods represent for the decoding methods with considering energy-efficient strategies. Thereinto, AD and SAD decoding methods can be designed according to classical FJSP [39]. GD method and decoding methods with considering energy-efficient strategies are specifically designed for the objective of minimizing total energy consumption.

1) SEMI-ACTIVE DECODING (SAD)
In SAD, each operation is assigned to machines in accordance with OS and MS strings. Each operation is assigned after the last operation of its selected machine. The procedures of SAD are as follows: Step 1: Determine the set of operations for every machine Step 2: Each operation , ij O is allocated after the last operation of its selected machine, of which the allowable earliest starting time * , ij B can be calculate as, Step 3: Repeat Steps 2 until all operations are finished.
Step 4: Generate the starting times and completion times of all operations.

2) ACTIVE DECODING (AD)
With regard to AD method, it is different from SAD. AD fully utilizes the idle-time intervals between consecutive operations that have been assigned, and it works by shifting the operations to the left idle-time intervals of a semi-active schedule without delaying other operations. Working steps of AD are described as follows: Step 1: The same as Step 1 of SAD. Step where, 1 i and 2 i represent for the two jobs that are on the front and at the back of idle-time interval respectively; 1 j and 2 j are indexes of operations for job 1 i and job 2 i respectively. Specifically, if operation , Step 3: Repeat Steps 2 until all operations are finished.
Step 4: Generate the starting times and completion times of all operations.

3) GREEDY DECODING (GD)
With regard to GD, it uses the OS string alone and the machine selection for each operation is decided in the decoding process with greedy selection for minimizing energy consumption, and its detailed steps are as below: Step

4) THE ENERGY-SAVING STRATEGIES FOR REDUCING IDLE ENERGY CONSUMPTION
With regard to the energy-saving strategies, they must be implemented after all the starting and completion times of all operations have been decided by using SAD, AD or GD methods. Two strategies namely postponing strategy and Turn Off/On strategy will be described as below. The first energy-saving strategy is postponing strategy, and its procedures can refer to paper [19] . To implement this strategy, critical operations must be decided. The method of deciding critical operations works as below: let ,  C is archived. To obtain the latest starting time and the latest completion time of each operation, it can be done as follows: Step 1: The last operation on each machine cannot be moved. Therefore, we set their latest completion times as their corresponding earliest completion times. The maximum latest completion time of the last operation of all machines is equal to max C .
Step 2: Starting from the last operation to the first operation according to OS string, the latest completion time of each operation except for operations in Step1 can be computed as, , , where, job ii denotes the job that operation , k ij SM belongs to; machine kk represents for the machine that operation , ij SJ is assigned to.
Step 3: Repeat Step 2 until the completion times of all operations are decided.
Step 4: Decide the critical operations, whose latest completion time is equal to its earliest completion time.

5) DECODING METHODS WITH CONSIDERING ENERGY-CONSCIOUS STRATEGIES
In this section, we propose four decoding methods with considering postponing strategy and Turn Off/On strategy namely energy-conscious active decoding (ECAD), energyconscious greedy decoding (ECGD), greedy hybrid decoding (GHD) and random hybrid decoding (RHD). Thereinto, ECAD is obtained by combining AD with energy-saving strategies, and ECGD is obtained by combining GD with energy-saving strategies. More specifically, ECAD is taken as an example, and it is obtained by following steps: Step 1: Decode a solution with the AD decoding method described in Section 4.3.2.
Step 2: Implement two energy saving strategies described in Section 4.3.4.
As to GHD, it is obtained by hybridizing ECAD and ECGD. Moreover, it progresses by selecting the best method of the two methods, and its steps are given as below: Step 1: Each chromosome is decoded by both ECAD and ECGD.
Step 2: Fitness values that are obtained by ECAD and ECGD are compared.
Step 3: The decoding method that gets smaller fitness value is choose.
As to RHD, it is obtained by randomly selecting ECAD or ECGD with the same probability.
As shown in Fig.5 (b), three no-critical operations, namely,

D. Selection operator
In HGA, the role of selection operator is to select the individuals according to the fitness (total energy consumption in this paper). For the purpose of this paper, we adopt two selection operators namely the elitist selection and the binary tournament selection [39,44] . The elitist selection aims to preserve the individual with the best fitness to the offspring. With regard to binary tournament selection, we randomly select two individuals from the population and select the one with better fitness.

E. Crossover operator 1) CROSSOVER FOR OS STRING
In this paper, precedence operation crossover (POX) has been adopted for OS string, which works as follow: Step 1: Divide the Job set into two sets namely Jset1 and Jset2 randomly.
Step 2: Copy the elements in parent P1\P2 that belong to Jset1\ Jset2 to offspring O1\ O2, and preserve their positions.
Step 3: Copy the remaining elements in P2\P1 that are not copied at Step2 to O1\ O2, preserving their order. Fig. 6(a) shows an example of the POX crossover.

2) CROSSOVER OPERATOR FOR MS STRING
For the MS string, uniform crossover is adopted. With regard to the crossover, firstly, i iI n ∈ ∑ binary numbers are generated randomly; then, the offspring are generated by swapping all elements of the two parents' strings whose corresponding binary numbers are equal to 1. Because the crossover operator only changes the elements and preserves their order, the offspring are feasible as long as the parents are feasible. Fig. 6(b) shows an example of the uniform crossover.

F. Mutation operator
In this paper, swap mutation and one-point-reassign mutation have been adopted for the OS string and MS string respectively. Swap mutation works by randomly selecting two different positions and exchanging their elements. With regard to one-point-reassign mutation, one position is randomly selected and then its value is changed to other eligible machine.

G. VNS
VNS, as a well-known local search method, works by systematically exploring several different neighborhood structures, and thus local optimal solutions in these neighborhoods are obtained. By comparing these local optima, better solution even the global optimal solution can be archived. In general, VNS is based on three perceptions, which are given as follows [45] :(1) A local optimum of one neighborhood structure may not be a local one for another neighborhood structure. (2) A global optimum is a local optimum of all possible neighborhood structures. (3) As to many optimization problems, local optima of one or several neighborhoods are relatively close to each other. For VNS, the design of neighborhoods is very important. In this study, four neighborhood structures are applied to produce new solutions. The first three neighborhood structures namely Swap, Insertion and Reversion are for OS string. The fourth neighborhood structure is Reassign, and it is for MS string.

N1(x)(Swap):
The same with the Swap mutation.

N2(x)(Insertion):
Firstly, randomly select two different positions; then, the operation in the second position is moved just before the operation in the first location and the operations between the two positions are moved right accordingly.

N3(x)(Reversion):
Randomly select two different positions and reverse the operations between them.

N4(x)( Reassign):
One-point-reassign is used. Fig.7 shows an example of the four neighborhood structures. The detailed steps of VNS are given as follows: Step 1(Initialization): Randomly generate the initial solution x and define a set of neighborhood structures max ( ), 1...
Step 2: Repeat the following Steps 3-6 until the stop criteria ( max kk > )is satisfied.
Step 4 (Shaking): Randomly produce a solution ' x from the kth neighborhood of x ( ' () k x Nx ∈ ).
Step 5 (Local search): Apply some local search method with ' x as initial solution. The local search method used in this paper is described as below: Step 5.1: Set 1 t = ; Step 5.2: Randomly produce a solution '' x from the kth Step 6 (Updating): If solution ' x is better than incumbent solution x , set ' xx = and 1 k = ; otherwise, set 1 kk = + .

H. Initiation and termination criteria
With regard to the initiation of the population, it is randomly generated based on the encoding principle. For the terminate criteria, the algorithm terminates when the number of evaluations reaches to the maximum or the CPU time reaches to the timelimit. If the terminate criteria is satisfied, the algorithm ends and the best solution is output.

V. COMPUTATIONAL RESULTS AND DISCUSSIONS
In order to test the MILP and the HGA, two sets of instances namely MFJS01-10 [46] and MK01-10 [47] are used. We adapt all the instances by adding energy consumption information and magnify the processing times of MK01-10 by 10 times. The setup time , ', ii k s is equal to ' iik ++.The processing power is randomly generated within [4,8]. The idle powers of machines are randomly generated among {1, 2, 3}. The setup power can be obtained by The energy consumption of the Turn Off/On for each machine is randomly generated among {10, 30, 60}. The breakeven time of Turn Off/On for each machine is randomly generated among {10, 15, 20}. The Turn Off/On time for each machine is randomly generated among {8, 12, 16}. The transporter power and the common power are set as 3 and 10 respectively. The maximum time of Turn Off/On for each machine is set to 3.
IBM ILOG CPLEX 12.7.1 is used to solve MILP model. The timelimit is set as 3600s. The solving method of CPLEX solver is branch-and-cut method, which is the combination of cutting plane and branch-and-bound method. For all the meta-heuristic algorithms (GA, VNS and HGA), they are all coded in C++. All the methods are run on computer with Win 7 system, Intel(R) Core(TM) 2 Duo CPU of 3.20 GHz processor and 8 GB of RAM memory.

A.THE EFFECTIVENESS OF THE PROPOSED MILP MODEL
In Table 1, Gap is the average optimality gap of the solution, and it can be calculated (CS-BS)*100%/CS. Where, CS denotes the best feasible solution that is obtained within the timelimit, and BS represents for the lower bound obtained within the timelimit. Moreover, a solution is optimal when its Gap is equal to 0. Therefore, Gap value is usually used for judging whether the optimal solution is achieved and evaluating different solutions [23,[48][49][50] . Moreover, NBVs, NCVs and NCs represent for the number of binary variables, the number of continuous variables, and the number of constraints respectively. Moreover, the CPU time in Table 1 is the time when the optimal solution is proved or the timelimit is reached.
It can be seen from Table 1 that the MILP model only could solve very small-sized instances such as MFJS01-02 to optimality efficiently. However, the solving time exponentially increases with the increasing size of the instance. For MFJS04-10, MK01-02 and MK04-06, it can only archive feasible solutions within 3600s. For MK03 and 07-10, it cannot obtain any feasible solution within 3600s. The reason behind this is NBVs, NCVs and NCs increase sharply when the problem scale becomes larger. The solving efficiency of the MILP model is inversely proportional to NBVs, NCVs and NCs.

2) THE EFFECTIVENESS OF ENERGY-SAVING STRATEGIES
This section evaluates the effectiveness of the proposed two energy-saving strategies. For this purpose, firstly, 1000 solutions are randomly generated. Then, both AD and the ECAD methods are utilized to obtain the fitness of each solution. In addition, we set the relative percentage increase (RPI) value as the evaluating indicator [19].
where, D denotes the fitness archived by AD method, and DE is the fitness obtained by ECAD method. In Table 2, RPI_Min, RPI_Max and RPI_Ave represent for the minimum RPI, maximum RPI and average RPI of the 1000 individuals. Obviously, Table 2 indicates that the energy-saving strategies perform well for reducing energy consumption, especially for the large-sized problems. This is because the proportion of idle energy consumption increases with the increasing size of the problem.  Table 3. In Table 3, MRPD, ARPD and SRPD denote the minimum, average and standard variance of RPD in 20 runs, respectively.
Obviously, Table 3 shows that HGAA outperforms HGAG in terms of both the overall MRPD and ARPD values. However, with regard to overall SRPD, HGAG outperforms HGAA. This is because that HGAA can search all over the solution space, which is bigger than what HGAG can search. The bigger solution space means the more possibility to find better solutions. However, bigger solution space may lead to a larger volatility on the solution. Compared with HGAA and HGAG, HGAR and HGAH perform better in terms of the overall MRPD, ARPD and SRPD values. The reason behind this can be explained that HGAR and HGAH can make full use of the superiorities of both ECAD and ECGD. Moreover, HGAH performs the best in terms of the overall MRPD, ARPD and SRPD values. This is because GHD selects the better one of ECAD and ECGD as the final decoding method, while RHD randomly selects ECAD or ECGD. RHD is of more randomicity than GHD.

4) COMPARISON RESULTS OF HGA,GA AND VNS
This section intends to compare HGA with GA and VNS so as to investigate the effectiveness of HGA. For all the algorithms, the GHD decoding method is used. Each algorithm runs 20 times for each instance. GA and VNS use the same stopping criterion of 100,000 evaluations with that of HGA. For fair comparison, the GA adopts the same search operators such as selection, crossover and mutation in HGA. The parameter setting of the GA is adopted as the same with HGA. Table 4 reports the results of these three algorithms with the same stopping criterion of 100,000 evaluations. Moreover, the CPU time in Table 4 represents for the running time of 20 runs when the stopping criteria is reached. As can be seen from Table 4, HGA performs best among all the three algorithms in terms of the overall MRPD, ARPD and SRPD values. HGA obtains the best results for all instances and generates an overall MRPD of 0 when compared with GA (5.50‰) and VNS (1.95‰), which demonstrates the better converge of HGA than GA and VNS.
Moreover, HGA obtains the minimum SRPD values for all the instances and generates the lowest overall SRPD of 6.64‰ compared with GA (11.57‰) and VNS (9.06‰), which shows better robustness of HGA than the other algorithms. As we can see from Table 4, for all the instances, the maximum solving time of HGA is 85.15s, and it is acceptable. With regard to real instances, the stopping criteria for them should be set according to actual conditions.
With regard to CPU time, we can see that the CPU time consumed by HGA is longer than that consumed by GA and VNS on average. Thus, for fair comparison, we compare these algorithms with the same stopping criteria of CPU time. The stopping criterion is set as the CPU time consumed by HGA with 100,000 evaluations. The comparative results are reported in Table 5. From Table 5, it can be seen that with the same CPU time as stopping criteria, HGA is still significantly better than GA and VNS in terms of the overall MRPD, ARPD and SRPD values.
In conclusion, the above experimental results allow us to conclude that the proposed HGA is more effective and robust than GA and VNS for solving FJSP-SDST-T problem with the objective of minimizing total energy consumption. This is because HGA is the hybrid of GA and VNS, in which VNS is inserted into GA to improve its local searching ability. Thus, HGA has both good global and local searching abilities.  Table 6 shows that although HGA can obtain seventeen better solutions than that MILP model can obtain, HGA cannot guarantee to obtain the optimal solution even for small-sized problems such as MFJS01-02. This reflects that the formulation of MILP model is also very important. This is because that optimal solutions obtained by MILP model can be set as the reference standard when one develops the approximation methods such as meta-heuristic algorithms, especially for new scheduling problems.

VI. CONCLUSIONS AND FUTURE RESEARCH
This study studies the energy-conscious FJSP-SDST-T, and proposes a MILP model as well as an effective hybrid algorithm HGA to solve it. To the best of our knowledge, there are no published papers addressing this problem. We firstly analyze the energy composition of the workshop and formulate a MILP model. Then, the hybrid algorithm HGA that hybridizes GA and VNS is proposed. With regard to the objective of minimizing the total energy consumption, we propose four energy-conscious decoding methods, in which two energy-saving strategies namely postponing strategy and Turn On/Off are specifically designed according to the characteristic of FJSP-SDST-T. Finally, computational experiments of twenty instances are conducted and results show that the MILP model could solve small-sized instances to optimality. The proposed two energy-conscious strategies are very effective in reducing idle energy consumption. Greedy hybrid decoding is proved to be the most effective method. Moreover, HGA outperforms GA and VNS for solving the FJSP-SDST-T with the objective of minimizing total energy consumption.
Our future research will focus on exploration of problemspecific characteristics to develop more effective heuristics for the energy-conscious FJSP-SDST-T such as the hybrid algorithm of GA and TS or SA, and other meta-heuristic algorithms (e.g., grey wolf optimization (GWO) algorithm, tabu search (TS) algorithm, migrating birds optimization (MBO) algorithm, virus optimization algorithm (VOA) algorithm and evolution (DE)). We would mostly welcome related researchers propose more efficient meta-heuristic algorithms for solving the problem in this paper. It is also welcome to design improved MILP model for solving FJSP-SDST-T. Moreover, it will be an interesting topic to study multi-objective FJSP-SDST-T with simultaneously minimizing total energy consumption and makespan.