As the new generation of information technologies such as the Internet of Things, cloud computing and artificial intelligence develop rapidly, the DT technology has been widely advanced (Tao et al., 2019). DT, as a technology integrating reality and virtuality, aims at creating a high-fidelity virtual model for each physical entity (PE) in order to simulate its state and behavior, and it has the ability to optimize and predict (Tsega et al., 2020). In recent years, the workshop production scheduling centered on DT is advancing quickly (Li et al., 2023; Schuh et al., 2020; Guo et al., 2020). In light of that, in this paper, the DT technology was applied to the pump station scheduling optimization problem, and a DT-based full-process dynamic pump station scheduling method for energy-saving optimization was proposed, as shown in Fig. 2. The detailed scheduling process is explained as follows:
Step 1: Virtual entities (VE) should be consistent with PE. The model is calibrated by checking the real-time communication among PE, VE, services (Ss) and DD, and the model parameters of VE are corrected according to the real-time data of PE to ensure the consistency of VE and PE in the scheduling period.
Step 2: A scheduling solution is generated. The real-time data (operation duration, vibration signals, etc.) and historical data (daily water consumption, weather, holidays, vibration signals, etc.) of PE are obtained before scheduling, and then, a solution of full-process pump station joint scheduling is formulated (Fig. 4).
Step 3: Equipment availability prediction is performed. The vibration signals of the equipment in the next four hours are simulated via the VE, and the equipment anomaly detection method based on root mean square (RMS) and kurtosis (Fig. 5) is used to check the equipment state and predict the equipment availability.
Step 4: Generate a rescheduling scheme. If the equipment is not available, the entire process pumping station joint scheduling method is triggered again to generate a rescheduling scheme. Otherwise, the rescheduling process is not triggered.
Step 5: The scheduling solution is verified. The scheduling solution is transmitted to the virtual workshop for simulation, which can help to find potential conflicts in the solution before implementation. After verification, the control command from the virtual workshop is fed back to the physical workshop and it is made to maintain synchronization with the expected plan.
Step 6: Knowledge is accumulated. After the scheduling is completed, all the elements, processes, businesses and other relevant information generated during the scheduling process are stored in the DT database to guide the subsequent scheduling work.
3.1 Composition and operation mechanism of DT pump station
DT pump stations mainly consist of PE (here referred to as the physical pump station), VE (here referred to as the virtual pump station), connection (CN), Digital Twin Data (DD) and Ss (here referred to as smart services), as shown in Fig. 3. Among them, the physical pump station is the collection of objective physical entity resource objects including humans, equipment, materials, methods and environments, which provide the operating environment for the physical pump station. The virtual pump station is a virtual mapping of the physical pump station, which simulates and analyzes a variety of production activities in the physical space. CN works as the link between the physical pump station and the virtual pump station, realizing the two-way connection, two-way interaction and two-way drive between them. DD is the core driving force of DT, which covers the real-time data generated by the physical pump station during the production and operation process and the simulation data obtained via the virtual pump station simulation under different working conditions. Smart services are dynamically predicting and optimizing the production and operation of the pump stations through the two-way real mapping and real-time interaction between the physical pump station and the virtual pump station, in order to realize the optimal operating state. The smart services include online monitoring of the pump station operation and the pump state, prediction of the RUL of pumps and the water demand, pump station scheduling optimization, emergency response, etc.
In order to maintain the dynamic balance among the water intake, the liquid level of the clean water reservoir and the water supply, an operation mechanism of the full-process pump station joint scheduling in the water purification plant was designed in this paper, as shown in Fig. 4. Firstly, the daily water demand was predicted by time series (Zhou et al., 2022). Subsequently, by using the least squares method, the hourly liquid level of the clean water reservoir on the next day was predicted. Next, the water supply was substituted into the pipeline characteristic curve equation as an input parameter in order to calculate the outlet pressure. Later, the RUL of pumps was predicted by adopting the equipment RUL model (Zhou et al., 2023). Finally, the parameters, including water intake, inlet pressure, water supply, outlet pressure, and RUL of the pumps, obtained from the reverse calculation of the liquid level of the clean water reservoir were respectively substituted into the pump station scheduling model to calculate the optimal pump opening solution of the water intake pump house and the water supply pump house.
3.2 Anomaly detection method based on RMS and kurtosis
Under the condition of unattended pump stations, due to disturbance factors such as equipment performance degradation, changes of working conditions and load adjustment, the offline scheduling model cannot respond to the application scenarios in time where the equipment suddenly changes during water production. Therefore, it is necessary to quantify the change of the equipment operating state to realize online equipment anomaly detection. Because of the simple calculation and easy implementation, a lot of research and applications related to time-frequency domain statistical indicators based on vibration signals are presented in the aspects of equipment fault prediction and health management (Xu et al., 2020). In addition, kurtosis and RMS have respective advantages and disadvantages. For example, kurtosis exhibits a better characterization ability of time variability, while RMS possesses better robustness to noise. Thus, on the basis of time-frequency domain statistical indicators, an equipment anomaly detection method based on RMS and kurtosis was proposed in this paper, as shown in Fig. 5. The specific steps are as follows:
(1) According to the original vibration signal, the kurtosis and RMS of the vibration signals before \({t}_{i}\) are calculated.
(2) The normal range of kurtosis and RMS before \({t}_{i}\) is calculated, namely, [−σ, µ + σ], where σ and µ represent the standard deviation and mean value of vibration signals, respectively.
(3) If three consecutive points of \({K\text{u}\text{r}}_{t}\) and \({RMS}_{t}\) are all beyond the normal range, the two points before \({t}_{i}\) are selected as the equipment fault time points. Otherwise, Step (1) is redone.
3.3 DT-based availability prediction
DT-based availability prediction refers to detecting equipment availability online through the comparison of the vibration signals collected by PE and those simulated by VE. In this method, the data of PE are represented as\({X=\left\{{x}_{t}\right\}}_{t=1}^{T}\), where \({x}_{t}\) is the vibration signal at t, and the data of VE are set as \({Y=\left\{{y}_{t}\right\}}_{t=1}^{T}\)and\({Z=\left\{\left\{{\text{z}}_{i}|1\le i\le n\right\}\right\}}_{t=1}^{T}\), where \({y}_{t}\) is the corresponding simulation value of \({x}_{t}\) at t, and \({\text{z}}_{i}\) represents other additional simulation results, such as the stress level and temperature distribution of parts, and n means the number of components.
As shown in Fig. 6, at t, parameters are selected from X and Y to respectively represent the data of PE and VE, expressed as \({D}_{PE}=\{{x}_{1},{x}_{2},{x}_{3},\dots ,{x}_{t}\}\) and \({D}_{VE}=\{{y}_{1},{y}_{2},{y}_{3},\dots ,{y}_{t}\}\). If the deviation value between \({D}_{PE}\) and \({D}_{VE}\) is within the predefined threshold, \({T}_{p}\), that is, \(\left|{D}_{PE}-{D}_{VE}\right|<{T}_{p}\), PE and VE are considered to be consistent. In this case, the data of VE can be used to monitor the equipment operating state by the anomaly detection method (Fig. 5). If the equipment works normally, the neural network method (Zhou et al., 2023) is used to perform the RUL prediction based on the feature data extracted from the data of PE and VE. If \(\left|{D}_{PE}-{D}_{VE}\right|\ge {T}_{p}\), PE and VE are inconsistent. To find the cause of the inconsistency, the value, \({D}_{HIS}\), in the historical data is introduced to judge the state of the equipment. If the values, \({D}_{VE}\) and \({D}_{HIS}\) are close, then the deviation is caused by PE, and PE is considered abnormal. Otherwise, the deviation is caused by the model failure of VE and PE is considered normal, then the VE parameters should be corrected. According to the prediction results, if the equipment works normally, it is available. If not, maintenance time needs to be reserved, during which the equipment is unavailable.
3.4 Optimization algorithm
The pump switching problem involves discrete and continuous variables, and the number of variables increases with the number of centrifugal pumps (Patel et al., 2020). Traditional GA, a point search method, is likely to lead to the local optimum instead of ensuring the global optimum. Further, it is difficult to solve the conflicts between different objectives as researched herein. Therefore, an IGA to solve the pump station scheduling problem was designed in this paper. The algorithm flow is shown in Fig. 7.
In this algorithm, the strategy of NSGA-II (Makaremi et al., 2017) was referred to address the multi-objective problem by looking for the non-dominated solutions, and the dominance relationship matrix shown in Eq. (6) was established according to the dominance relationship between individuals produced in the process of running the algorithm. Later, the probability of each individual in the population being selected at each iteration was calculated by Eq. (7), and the non-dominated solutions obtained were stored in the archive set. Thus, the conflicts between different objectives could be effectively dealt with. Meanwhile, in order to improve the optimization ability of the algorithm, self-adaptive design was conducted on the algorithm exhibited in Fig. 7. In the design, when the archive set was found not updated after several iterations, it indicated that the algorithm search was stagnant. Then, the search range of the algorithm was expanded by reducing the crossover probability and increasing the variation probability and the number of variation points, in order to help the algorithm jump out of the local optimum.
$$RR=\left( {\begin{array}{*{20}{c}} {r{r_{11}}}&{r{r_{12}}}&{...}&{r{r_{1n}}} \\ {r{r_{21}}}&{r{r_{22}}}&{...}&{r{r_{2n}}} \\ {...}&{...}&{...}&{...} \\ {r{r_{n1}}}&{r{r_{n2}}}&{...}&{r{r_{nn}}} \end{array}} \right)$$
6
$$Pro{b_h}=\frac{{\sum\limits_{{k=1}}^{n} {r{r_{hk}}} }}{{\sum\limits_{{h=1}}^{n} {\sum\limits_{{k=1}}^{n} {r{r_{hk}}} } }}$$
7
Where, \({rr}_{hk}\) represents the dominance relationship between the hth individual and the kth individual in the population, n represents the number of individuals in the population, and \({Prob}_{h}\) stands for the probability that the hth individual in the population is selected during the selection operation. If h = k, the value of \({rr}_{hk}\) is 0.5. If the kth individual is dominated by the hth individual and h is not equal to k, the value of \({rr}_{hk}\) is 1. If the kth individual dominates the hth individual, the value of \({rr}_{hk}\) is 0. If the two individuals do not dominate each other, the value of \({rr}_{hk}\) is 0.5.
The specific operation steps of the algorithm are as follows:
Step 1: By fitting the data of experiments, the flow rate-efficiency (Q-\(\eta\)), flow-head (Q-H) and flow-power (Q-N) characteristic curves (Equations (10)-(21) shown in Section 4.1) at the rated rotational speed are obtained. Through the Q-\(\eta\) characteristic curve, the ranges of 10% at both sides of the rated efficiency point (i.e., the highest efficiency point of the pumps) are taken as the interval corresponding to the flow rate values of the constant-speed pumps and the variable-speed pumps, namely, [\({Q}_{ \text{m}\text{i}\text{n}}\),\({Q}_{ \text{m}\text{a}\text{x}}\)].
Step 2: On the basis of the change law of water consumption and water supply pressure in City A, a scheduling period (one day) is generally divided into six sessions. Thus, the rows of two-dimensional chromosomes is 6, then the ranges of water demand (Q) and the outlet pressure (H) in different sessions during the scheduling period are determined.
Step 3: According to the Q-N characteristic curve, the relevant parameters of the fitness functions (Equations (1)-(5)) and the constraints are set.
Step 4: The initial population with the size of n is obtained by random generation, and the maximum number of iterations is set as genMax.
Step 5: The fitness value of chromosomes is calculated. After selection, the chromosomes are recombined to gain a new population. Subsequently, the crossover and variation operators are performed according to the data of the archive set, so as to obtain the implementation solutions to the opening of constant-speed and variable-speed pumps, as well as the optimal speed ratio. And the solutions are stored in the two-dimensional array of 6*J.
Step 6: Termination condition judgment is conducted. If the termination conditions are not met, Step 5 will be redone. If the termination conditions are met, Step 7 will be the next step.
Step 7: The optimized solution and related data are output, and the algorithm ends.