Implementation of real‐time hybrid simulation based on GPU computing

With combination of physical experiment and numerical simulation, real‐time hybrid simulation (RTHS) can enlarge the dimensions of testing specimens and improve the testing accuracy. However, due to the limitation of computing capacity, the maximum degrees of freedom for numerical substructure are less than 7000 from the reported RTHS testing. It cannot meet the testing requirements for evaluating the dynamic performance of large and complex engineering structures. Taking advantages of parallel computing toolbox (PCT) in Matlab and high‐performance computing of graphics processing unit (GPU). A RTHS framework based on MATLAB and GPU was established in this work. Using this framework, a soil‐structure interaction system (SSI) was tested by a shaking table based RTHS. Meanwhile, the dynamic response of this SSI system was simulated by finite element analysis. The comparison of simulation and testing results demonstrated that the proposed testing framework can implement RTHS testing successfully. Using this method, the maximum degrees of freedom for numerical substructure can reach to 27,000, which significantly enhance the testing capacity of RTHS testing for large and complex engineering structures.

solution into two parts, response analysis task (RAT) and signal generation task (SGT), realized RTHS testing with 10 degrees of freedom (DOFs) under Δt = 330 ms or 12 DOFs under Δt = 500 ms. Cheng [19] developed a "hybrid finite element" program by MATLAB, which combined finite element method (FEM) with hybrid structure testing, realized RTHS testing with 122 DOFs under Δt = 10 ms. Chae [20] used Hybrid-FEM technology to accomplish a RTHS testing with 514 DOFs under Δt = 10 ms. Saouma [20] developed a program named Mercury running on real-time hardware and completed a nonlinear model RTHS testing with 405 DOFs under Δt = 10 ms. Zhu [10] used RAT and SGT in two different target computers, realized RTHS testing with 1240 DOFs under Δt = 20 ms. Lu [22] used a RTHS calculation system based on Simulink real-time and four-core parallel programming in Windows environment, realized RTHS testing with about 6000 DOFs (Non-Diagonal Damping Matrix Case) under Δt = 20 ms. In summary, the research of algorithms and the development of FEM improve the real-time computing capacity of numerical substructure. However, as known from the reported work, the maximum DOFs of the numerical substructure for RTHS is hard to exceed 7000.
In the research field of solution efficiency, traditional numerical substructure calculations are all based on central processing unit (CPU) in computer. At present, the calculation capacity of CPU conforms to Moore's Law. [22] Due to the limitation of calculation capacity, complex numerical models are difficult to be solved in real-time using small time step. Since 2008, NVIDIA Corporation has used graphics processing unit (GPU) for scientific computing successfully. Compared with CPU, there are more computing units in the GPU, it is more advantageous to use GPU in large-scale numerical calculations. [23] Papadopoulos, [24] Gravvanis, [25] and Zegard [26] proved that GPU parallel calculation based on the Compute Unified Device Architecture (CUDA) has higher calculation efficiency than multi-CPU processors in solving large sparse matrix. In the field of engineering, researchers were also using GPU to accelerate calculation and simulation. [27,28] With combination of GPU parallel and discrete element method (DEM), Mohammad [29] studied displacement slip faulting through granular soils by a method of GPU-based DEM modeling. Durand [30] realized the simulation of contact between rock and concrete 30 times based on GPU (NVIDIA Fermi C2050) faster than CPU (Intel Xeon X5650 2.67 GHz). Lu [31] used GPU to simulate seismic damage of a medium-sized urban area, achieved 39 times faster than CPU, which were similarly priced. GPU in the field of civil engineering calculation has a very high computational efficiency. [32][33][34] There are several software support GPU acceleration calculations, including finite element software such as ABAQUS, OpenSEES, and ANSYS, and mathematical calculation software such as MATLAB. [35][36][37][38] Hence, this work attempts to develop a RTHS framework based on MATLAB and GPU, the accuracy and efficiency of GPU (NVIDIA Tesla V100) compared with CPU (Intel Xeon E5-2690 V4) were verified by simulation and RTHS testing.

| GPU-BASED RTHS TESTING SYSTEM
This work established a testing system framework that used a GPU as the computational hardware instead of a typical CPU. Before developing a testing system using GPU computing, two issues must be addressed. The first is how to implement real-time signal interaction between two substructures, and the second is how to use a GPU to do dynamic analysis on a large-scale numerical substructure.

| Schemes of framework
As shown in Figure 1, the testing framework is made up of three parts: numerical substructure solution, signal transmission, and physical substructure loading. In RTHS, numerical solution efficiency must meet the requirements of real-time loading of the physical substructure, and each F I G U R E 1 Graphics processing unit (GPU)-based framework for real-time hybrid simulation (RTHS) dynamic solution step has a fixed time. In numerical substructure solution, the time control loop structure in LABVIEW [39] and the Parallel Computing Toolbox (PCT) in MATLAB are required. The time control loop structure ensures that the time of numerical substructure solution is fixed at each step. The period of the loop structure is adjusted according to the step size of the integration algorithm. The period of the loop structure is modified based on the integration algorithm's step size. The M-file script (a form of MATLAB script file) for solving numerical substructure is loaded into the time control loop structure's MATLAB script window. [40] Parallel Computing Toolbox (PCT) can be used in MATLAB R2014a or updated version. [41] The efficiency of huge matrix basic computation and signal processing may be considerably improved by PCT. The function of signal transmission part is to ensure data communication between two substructures in RTHS. The digital signal from the numerical solution computer is converted into analog signal, and then transmitted to shaking table. Signal acquisition cards are needed to achieve the conversion of digital signal and analog signal.
The interface response of the numerical substructure is transmitted from the signal transmission part into the controller of the shaking table in the physical substructure part. The shaking table applies the interface response to the physical substructure after receiving the command, the sensors measure the response of physical substructure and return to the numerical substructure through the signal transmission part.

| Solution of numerical substructure
As mentioned in introduction, GPU can be used in a variety of ways for dynamic analysis. Pre-processing tools are available in commercial finite element software, which can be used to prepare data for dynamic analysis in MATLAB. Numerical model parameter matrices are extracted by commercial finite element software's pre-processing functions.
The numerical substructure is modeled by pre-processing of ABAQUS in this work. The steps for extracting parameters of numerical substructure from ABAQUS are as follows.
Step 1, establish the finite element model.
Step 2, output the stiffness and mass parameters: In the file directory, the model parameters recorded in the text file with the type "inp" will be created. Save the "inp" file after adding the following code. Total mass and stiffness parameters are obtained after resubmitting the task file; however, the parameters are not yet in matrix form.
* step,name=matrix * matrix generate,stiffness,mass * matrix output,stiffness,mass * end step Step 3, convert stiffness and mass parameters to matrices: using MATLAB to import and process the parameters file, the mass and stiffness matrices of numerical substructure are obtained.
ABAQUS can only extract the mass and stiffness parameters. The damping matrix is constructed from the stiffness and mass matrix based on Rayleigh damping as shown in Equation (1), ½C is the Rayleigh damping matrix, ½M is the mass matrix, and ½K is the stiffness matrix. a 0 and a 1 are two proportional parameters, which are determined according to the first two orders of modal frequencies.
½C ¼ a 0 ½M þ a 1 ½K ð 1Þ Figure 2 shows the flowchart of numerical substructure solving based on GPU and MATLAB. The real-time solution of numerical substructure in GPU-based RTHS testing requires CPU and GPU, in which CPU is for transmission of the interface response and GPU is for solution.
F I G U R E 2 Solution of numerical substructure in graphics processing unit (GPU)-based real-time hybrid simulation (RTHS) testing When using the Parallel Computing Toolbox for GPU computing, only minor changes to the original CPU-based computation program are necessary, and the CUDA framework based on the C++ language is not required. When using GPU computing, data is transferred from the host to the device before computing, and data is copied back from the device when computing is complete. The functions used in the calculation are the same as those used in the CPU calculation. The GPU computation processes using the Parallel Computing Toolbox are as follows.
1. the data are copied from host memory to device memory using the gpuArray function.
2. Algorithm programs are run on GPU instead of CPU.
3. the computing results are transferred from device memory to host memory using the gather function.

| NUMERICAL VERIFICATION
To verify GPU's solution capacity, a numerical modal soil-structure interaction (SSI) system was used, which is solved by GPU and CPU separately.

| Parameters of simulation
As shown in Figure 3, the upper part of the SSI system is physical substructure, and the soil is numerical substructure. The size of the numerical model is 30m Â 30m Â 15m, the density is 1 Â 10 4 kg/m 3 , the Young's modulus is 211 MPa, and the damping ratio is 0.05. Viscoelastic boundary is set to simulate far-field soil boundary conditions. The normal stiffness and damping are 20 kN/m and 1:437 Â 10 3 kN/(m/s), The tangential stiffness and damping are 10 kN/m and 9:45 Â 10 3 kN/(m/s). The Kobe seismic wave is adopted as excitation, the peak ground acceleration (PGA) is adjusted to 0.5 g, central difference method and Newmark-β method are used for numerical dynamic analysis method.
The SSI system is only used to verify the performance of simulation, that is not the focus of this work. The numerical substructure's modal analysis was performed in ABAQUS, the highest order frequency of the numerical substructure is 79.23 rad/s, the stable condition for central difference method is time step Δt ≤ T n =π ¼ 2=ω n ¼ 25:20 ms, T n and ω n are natural periods and frequencies of the structure. The maximum time step of the RTHS in this work is 20 ms, and the central difference method will be stable. The computing capacities of GPU and CPU based on MATLAB were compared in the simulation, with a personal computer (PC) equipped with consumer-grade GPU and a server equipped with professional-grade GPU. The hardware configuration parameters of the PC and server are shown in Table 1.

| Results of simulation
To compare the calculation efficiency of CPU and GPU, it is necessary to record their time costs in the process. There is a predictive function to record the time costs of programs in MATLAB. In the program, "tic" is added at the beginning of dynamic analysis program; "toc" is added at the end; the time cost in each step of dynamic analysis is counted. To evaluate the calculation capability of GPU in RTHS testing, the different modal When DOFs of numerical model is 3888, the time cost taken by CPU on PC and the CPU on server are similar; it shows the calculation capacity of the CPU on server cannot be brought into full play when the computation is small. When using GPU to solve the model, the speed of the GPU on server is much faster than the GPU on PC. The speed of GPU is 2 times faster than CPU on PC, and the speed of GPU on server is 8.5 times faster than CPU on server. When the DOFs is 6591, the CPU on PC needs 68 ms in each step, while GPU only needs 21 ms, the speedup effect of GPU on PC is 3.4 times faster than CPU on PC, and the speedup effect of the GPU on server is 12 times faster than CPU on server. When the DOFs is more than 6951, it will exceed the calculation capability of MATLAB on PC, but the DOFs can further increase on the server.
With increase of DOFs, the numerical solution for each time step cost more time. In this work, when the time step is 17 ms, the maximum DOFs solved in real time is 27,000. It needs 936 ms for the time step of 20 ms to solve this model through the CPU on server. The GPU on server can achieve 55 times faster than the CPU on server. When the DOFs is more than 27,000, it is out of the memory of MATLAB on the server.

| EXPERIMENTAL VERIFICATION
The solving capability of GPU was verified by simulation in Section 3, the performance of GPU-based testing system will be evaluated by shaking     Figure 7. GPU was used to solve numerical model, and the solving time of numerical substructure was controlled by the loop structure of time control in LABVIEW. After solving numerical substructure based on GPU, the interface response y N was sent to physical substructure by signal transmission part. The physical substructure was loaded by shaking table after the loading system receiving the response. The dynamic response of the physical substructure was measured by sensors and transmitted to numerical substructure. The dynamic characteristics of the shaking table were compensated by an out-loop controller, which was added between the signal transmission part and the shaking table.
The command signal y 0 N was generated by the controller. The out-loop controller program was written in SIMULINK and then downloaded into dSPACE to run in real time.

| Testing implementation
The parameters of experiment modal are same as Figure 3 in Section 3.1. In this shaking table RTHS testing, the loading facility is a 0:5m Â 0:5m electromagnetic shaking table, solving part of numerical substructure is same as the GPU server configuration in Table 1, and the other parts are shown in Table 4. The layout of the physical substructure and the sensors were shown in Figure 8. During the testing process, the sensors were connected to the control system of the shaking table. The acceleration sensors were installed on the top of the physical substructure and the shaking table.
And the displacement of the physical substructure was measured by a laser displacement meter. The displacement of the shaking table was   obtained from the control system of the shaking table. The physical substructure is an aluminum single-layer steel frame, and the bottom was   fixed on the shaking table by  and El-Centro waves with PGA of 0.5 g and root-mean-square (RMS) of 0.059 was input to the shaking table, and the displacements of the physical substructure were compared with integral simulation through central difference method. Figure 9 (i) shows the displacement time history results. In Figure 9 (ii), X axis is the displacement time history in simulation, and Y axis is the displacement time history of physical substructure in RTHS testing. The inevitable error existed when the frame was treated as a linear time-invariant system. Therefore, the measured and simulated response in Figure 9 was not consistent perfectly. However, this work focuses on the feasibility of GPU for RTHS testing.  9 (a,b) The verifications of model parameters under different seismic excitations G st ¼ 1:023e9 s 4 þ 427:5s 3 þ 1:481e05s 2 þ 3:041e07s þ 1:017e09 ð3Þ Figure 10 shows the transfer function of the shaking table and identified results. The amplitude and phase errors of the shaking table signals are very small when the frequency between 0 and 3 Hz; hence, the shaking table command can be accurately loaded in this frequency range.
When the frequency exceeds 3 Hz, the differences of amplitude and phase increase with the rise of frequency, so it is necessary to add an outloop controller (FSCS) to improve the accuracy of loading, [43] which is shown in Figure 11.
The shaking table's input command is a Kobe wave with a PGA of 0.5 g and RMS of 0.059. Figure 12 shows the expected time histories and the achieved response with or without FSCS controller. The time histories with FSCS controller are much closer to the expectation than those without FSCS. The results show that the FSCS controlled shaking table is suitable for the RTHS testing.
In this testing framework, LABVIEW and MATLAB used in the numerical substructure were based on Windows operating system (OS).
Windows is not a real-time OS, the real-time performance is unstable, [43] this means that the time costs of solving numerical substructures using Windows is not fixed. To ensure that more CPU and memory resources are allocated to the numerical calculation, the priority of LABVIEW and MATLAB can be set higher in Windows OS task manager.In this GPU-based RTHS strategy, there is a delay in data communication when LABVIEW invoked MATLAB, which leaded to a delay of nearly 3 ms in data transmission between MATLAB and LABVIEW. After the dynamic analysis of numerical substructure is completed, it takes 3 ms to send the data to the signal transmission part. The MATLAB script stops calculating within this 3 ms. Therefore, the minimum integration step in the RTHS testing in this work is shown as Equation (4).  (4) is the actual integration step of numerical integration, and t solve is the actual solving time cost. For example, the time step would be 4 ms when the time cost of solution is 1 ms. The accuracy of numerical integration is greatly affected by the integration step, [11] hence the maximum integration step of RTHS testing is 20 ms in this work, because of the 3 ms in data communication, the actual solving time cost of solution is 17 ms.

| Testing results
The time step of RTHS and computing speed of computer limit the maximum DOFs for real-time solution of the numerical substructure. Table 5 shows the maximum DOFs of the numerical substructure model solved by GPU and CPU, with different time steps using double and single precision. Figure 13 shows time histories of the physical substructure solved by GPU and CPU, with Δt = 4 ms and double precision data. The results of RTHS testing were consistent with the results of integral simulation, which indicated that the testing framework can meet the accuracy requirements of RTHS testing. Figure 14 shows displacement time histories of physical substructure obtained from GPU-based RTHS testing and integral simulation under working conditions NO. 2-9 in Table 5 Table 6. In some cases, the simulated response cannot match perfectly with RTHS testing results because of the modeling errors of physical substructure shown in Figure 9 but not the calculation errors resulted from GPU or CPU. Therefore, we are still sure that, with the same numerical substructure, the precision of RTHS testing based on GPU is same as CPU.
To discuss the influence of integration step in GPU-based RTHS testing, a numerical substructure model with 3888 DOFs was solved by GPU and CPU. NO. 8 and 9 in Table 5 show that the minimum Δt are 5 and 20 ms respectively. Figure 15 shows displacement and acceleration time histories of physical substructure in the RTHS testing and integral simulation. The peak error of displacement between the RTHS testing and the    Table 5 shows that, with Δt = 4 ms and double precision, the maximum DOFs is 1500 solved by GPU and 1080 solved by CPU, the advantage of GPU solving is not obvious. When the precision is single, the maximum DOFs is 3168 solved by GPU, and 1500 solved by CPU. The advance of solution by GPU with single precision is higher than double precision.

| CONCLUSION
The advantage of RTHS is to combine physical testing with numerical simulation and improve the testing capability of existing equipment. Due to the limitations of calculation cost, only large integration time step (e.g., 20 ms) and less DOFs (e.g., 2000) for numerical substructure solutions are allowed in the current RTHS testing. It is detrimental to utilize the testing capability and accuracy of RTHS testing. In order to overcome these drawbacks, this work established a RTHS framework based on GPU and Matlab. The performance of framework was verified by numerical simulation and experimental testing. The following conclusions were drawn.
(1) Under the condition of Δt = 20 ms and single precision data, using the RTHS testing framework based on MATLAB and GPU (NVIDIA Tesla V100, 10.60 TFLOPS), the maximum DOFs for numerical substructure can reach 27,000 in real-time solution, the speed of GPU calculation is 55 times faster than that of CPU (Intel Xeon E5-2690 V4, 14 cores 2.6 GHz). And using this framework the numerical substructure based on CPU can only reach 3888 with Δt = 20 ms in real-time solution. The testing capacity of RTHS is significantly enhanced by GPU solving.
(2) With the numerical substructure of 3888 DOFs and the model parameters are single precision, the minimum Δt is 5 ms by the RTHS testing based on GPU, and that is 20 ms based on CPU. The RTHS testing results are compared with the integral simulation, the peak errors of displacement and acceleration time history are reduced, Using the testing framework based on GPU in this work can reduce the time step and improve the testing accuracy.
(3) MATLAB and LABVIEW under Windows operating system are used in this work. Because of instability of Windows clock frequency and delay of communication between two software, it takes 3 ms to transfer the response data; hence, the performance of GPU calculation is limited.
Further research is needed to consider real-time performance of operating system and optimization of software interaction, it can further decrease the time step and enhance the DOFs of numerical substructures in RTHS testing.
(4) The speed of the dynamic solution in real-time was greatly improved by GPU instead of CPU through Python. The numerical substructure was obtained from the preprocessing function of finite element software, which was only applicable to linear model. If the numerical substructure develops nonlinear behavior, GPU solution by finite element software would be more convenient. Real-time communication between the GPU solution through finite element software and the testing framework is necessary in the future work.

DATA AVAILABILITY STATEMENT
Research data are not shared.