Wave based damage detection in solid structures using artificial neural networks

The identification of structural damages takes a more and more important role within the modern economy, where often the monitoring of an infrastructure is the last approach to keep it under public use. Conventional monitoring methods require specialized engineers and are mainly time consuming. This research paper considers the ability of neural networks to recognize the initial or alteration of structural properties based on the training processes. The presented work here is based on Convolutional Neural Networks (CNN) for wave field pattern recognition, or more specifically the wave field change recognition. The CNN model is used to identify the change within propagating wave fields after a crack initiation within the structure. The paper describes the implemented method and the required training procedure to get a successful crack detection accuracy, where the training data are based on the dynamic lattice model. Although the training of the model is still time consuming, the proposed new method has an enormous potential to become a new crack detection or structural health monitoring approach within the conventional monitoring methods.


Introduction
For the permanent use of existing structures in urban areas, as well as for lifelines and for safety structures such as dams, the successful monitoring of structures is of the highest priority.Usually, conventional methods of structural dynamics are used to analyse the state of structures and to find existing or propagating damages, while wave based method are less used in ordinary structural dynamics.These methods are more usual in the field of non-destructive testing (NDT) [Kaewunruen andRemennikov, 2006, Farhangdoust andMehrabi, 2019] under use of very high excitation frequencies and shorter wave lengths.Beside the type of analysis method -structural dynamics with long wave length or ultra-sound methods in NDT with extreme short wave length -the matter of the structural analysis is based on the active analysis of excited vibrations or wave fields until the structural damage is detected.Without any knowledge of preexisting damage zones, the analysis can take lots of time.The present development shows a new strategy from a numerical case study to analyse structures under use of artificial neural networks.The artificial neural networks (ANN) [Sha and Edwards, 2007] are used to learn the change of structural response patterns of damaged structures in opposition to non-damaged structures.
With the booming development in large-scale data generation and computation power, deep learning algorithms, especially deep convolutional networks (known as CNNs or ConvNets), are developing by leaps and bounds.Deep learning methods have been applied to various many fields, such as computer vision and natural language processing, and often outperforms conventional methods [LeCun et al., 2015].The multi-layer structured deep models can learn patterns from data at multiple levels of abstraction.Convolution operation performs an important role in CNN layers.ConvNets are particularly suitable to learn from array-like data, such as audio, image, and video.These data are uniformly sampled from signals in spatial and/or temporal domains.In many applications, ConvNets have been used as the backbone for pattern extraction, e.g., recognising objects from images [Girshick, 2015], understanding text [Kim, 2014], and synthesising audio [van den Oord et al., 2016].In recent years, attempts have been made to apply ConvNets to damage detection as well [Abdeljaber et al., 2017, Rautela andGopalakrishnan, 2019].
Deep learning methods have long been expected to be promising for wave-based damage detection [Avci et al., 2021].Compared to the hand-engineered-feature-based methods, the deep-learningbased method uses deep neural networks as a feature extractor to learn representations from wave fields [Guo et al., 2020].1D CNNs and RNNs are two popular structures to recognize patterns from 1D signals.Abdeljaber et al. trained multiple 1D CNNs to detect whether damage exists at specific locations (joints) [Abdeljaber et al., 2017].Their model uses the acceleration signal at each joint as input and requires extra workload to segment the signal into frames.Considering damage detection as a binary prediction problem, i.e., predicting whether a crack exists from input data, 1D CNN, RNN, and LSTM models all can achieve high accuracy [Rautela and Gopalakrishnan, 2019].In one following paper, the authors developed a two-stage damage detection method [Rautela and Gopalakrishnan, 2020].The method determines whether a sample is damaged or not at the first stage and then predict the location and length of the damage with another regressor network.However, the regressor network deals only with the damage that is orthogonal to the sample's surface.Khan et al. transformed the structural vibration responses in the time domain into twodimensional spectral frame representation and then applied a 2D CNN to distinguish between the undamaged and various damaged states [Khan et al., 2019].Besides using CNNs for wave-based damage detection, there are also methods using different input data or a combination with other learning schemes.For example, in [Gulgec et al., 2019], the authors proposed to use 2D CNN to predict the bounding box of a crack from raw strain field.By adding noise and changing loading patterns to augment the data, the model could achieve some robustness.Nunes and her colleagues developed a hybrid strategy to detect undefined structural changes [Nunes et al., 2020].They proposed to apply a unsupervised k-means clustering method to the CNN learned features, and thus the features that are extracted from the samples with undefined changes are expected to fall out of these clusters.Our work differs from the above mentioned works by training an end-to-end model to predict both crack shapes and locations from large amounts of simulated wave fields.
The numerical treatments for the paper are done by use of a meso-scale method as it is better suited to capture the effects as initial cracking and crack propagation depending on the material parameter and the initial-& boundary conditions without pre-definition of damaged patches, and it is also applicable for 2D and 3D problems.The Lattice Element Method is a class of discrete models in which the structural solid is represented as a 3D assembly of one-dimensional elements [Wong et al., 2014, Rizvi et al., 2019, Sattari et al., 2017].This idea allows one to provide robust models for propagation of discontinuities, multiple crack interactions, or cracks coalescence even under dynamic loads and wave fields.Different computational procedures for lattice element methods for representing linear elastic continuum have been developed.Beside different mechanical, hydromechanical and multi-physical developments, the extension and basics for a new dynamic Lattice Element Method was presented in [Rizvi et al., 2018[Rizvi et al., , 2020]].This development will be used in the given paper for the health monitoring of structures.
To perform the damage detection as a suitable numerical software, pattern indicators and specific designed neural networks are needed.The numerical simulation is realized under use of Dynamic Lattice-Element Method, where the advantage of the discontinuum method in opposition to continuum methods related to the damage detection will be discussed in the methodology section.The implemented artificial neural networks are also described in this section.Based on the considered numerical and DNN (deep neural networks) models, a case study of a 2D plane is performed to show the developments and results of the new approach.

Dynamic Lattice Approach
The assembly of the heterogeneous and homogeneous material will be generated by specific meshing algorithms in LEM.The Lattice Element Models with the lattice nodes can be considered as the centers of the unit cells, which are connected by beams that can carry normal force, shear force and bending moment.Because the strain energy stored in each element can be exceeded by a given threshold, the element is either removed for cracking or assigned a lower stiffness value.The method is based on minimizing the stored energy of the system.The size of the localized fracture process zone around the static or propagating crack plays a key role in failure mechanism, which is observed in various models of linear elastic fracture mechanics and multi-scale theories or homogenization techniques.Normally this propagating crack process needs a regularization, however, an efficient way of dealing with this kind of numerical problem is by introducing the embedded strong discontinuity into lattice elements, resulting in mesh-independent computations of failure response.The generation of the lattice elements are done by Voronoi cells and Delaunay itself (Sattari et al. [2017], Moukarzel and Herrmann [1992]).With the performance of this procedure an easy algebraic equation is generated for the static case.To develop the dynamic LEM for simulation of a propagating wave field, a more complex extension of the LEM is needed.The following solution of the dynamic LEM is solved as a transient solution in the time domain.

Equation of motion
To solve the dynamic LEM, the static LEM needs an extension of the equation of motion.The general equation of motion without the damping term is defined by where M and K are the mass and the stiffness matrices terms and F(t) is the applied time-dependent force.Both matrices, the mass and stiffness matrix, have to be defined in terms of the LEM definition.

Mass Matrix generation
The mass matrix or the consistent mass matrix (CMM) is generated either by lumping the mass at the nodes or by following the variation mass lumping (VMM) scheme.The VMM scheme is also implemented in the finite element method for dynamic simulations.The element mass M e is computed using the following equation If the shape functions are identical, that is, N e v = N e , the mass matrix is called the consistent mass matrix (CMM) or M e c .
Where, ρ is the density assigned to the Voronoi cells and A and l are the area and the length of the lattice elements The elemental mass matrix is symmetric, physically symmetric, and complies with the condition of conservation and positively.To obtain the global mass matrix, a congruent transformation is applied.In contrast to the stiffness matrix, translational masses never vanish.All the translational masses are retained in the local mass matrix.The global transformation is achieved through the following equation. ) (5)

Element Stiffness Matrix
The force displacement component of a truss element is given by the spring relation The vectors {F} and {U} are the member joint force and member joint displacement, respectively shown in figure 2. The member stiffness matrix or the local stiffness matrix is [K].For a truss element it is given by After applying the congruent transformation, the member stiffness matrix in global coordinates are given as ) Where l = cosφ e , m = sinφ e and with φ e as the orientation angle as shown in Figure 2.

Time domain solution of Equation of Motion
The equation of motion for the linear system of equations is solved with the Newmark beta method due to its unconditional stability.The displacement and the velocity terms for the next time step are calculated as follows: We follow the average acceleration approach with β = 1 4 and γ = 1 2 The Newmark beta method solves the algebraic form of the equation of motion (EOM) of undamped forced vibration at the end time interval t + ∆t The stiffness and the mass matrices are computed in the following fashion to reduce in the form of equation ( 9) Where K is the effective stiffness matrix and a 0 = 6 γ∆t 2 Similarly, the effective load vector at time t + ∆t is calculated as in (17).
From the above equation, displacement of each node is calculated for every time step.The natural frequency of the system is calculated as given below. ) The detailed description of the theory and implementation of the dynamic Lattice-Element method with validation and verification of the method by analytical and numerical benchmarks is given in [Rizvi et al., 2018[Rizvi et al., , 2020]].

Wave field identification by Convolutional Neural Networks
The basic idea of deep learning system damage detection in the given case of propagating wave fields is the identification of wave field patterns respective of the change in wave field patterns during the damage evolution.To apply the idea, the excitation and receiver points keep constant during the monitoring process.
The damage evolution process covers the initial static case on the given plate.After a change of surrounding, static stress condition damages on different positions in the plate area can be created depending on the stress condition and the material parameter.Before and after a damage scenario, a small strain wave field is excited to propagate through the plate.Because of the damage / crack, within the plate the pattern of the propagating wave field will be modified.
The interaction of the wave field within the crack is essential for identifying the correct wave field.
Under the assumption of an open crack, neglecting shear slipping and crack growth under dynamic loads, the crack will produce a mode conversion and a scattering of the propagating wave field.That phenomena is studied in [Rizvi et al., 2018[Rizvi et al., , 2020]].It becomes obvious that the transient solution provides that phenomena.
For future application of the DNN-Damage-Detection of real structures, the virtual tool has to be optimized and needs sufficient training data.The numerical data should provide the base of the training data of the CNN.The plate has fixed boundary conditions at the bottom and no constrains at other sides.The wave source can change its location around the plate boundary.Receivers that record wave displacement are assumed to be along the free boundary as well as on the inner surface of the plate.

Training of 1D CNN-Based
The sequential measures of time-dependent displacement amplitudes are used as the data source for the planned damage detection network.The instantaneous load added at a chosen excitation point causes high displacement amplitudes at the wave field and decreases rapidly after several time steps to the wave coda at a smaller strain level.The observable surface wave front, as well as at interference of back scattered and reflected wave field is the result of the wave propagation and reflection in the plate.The described phenomenon are clearly visible in an example in Section 3.1.
In this paper, the whole content of the wave field, the initial wave front, the coherent part and the diffusive part will be used for the damage detection in the time domain.There is no selection respective analysis of harmonic wave modes in the part coherent wave field [Wuttke et al., 2012b] or application of the interferometric method at the code in the diffusive part [Wuttke et al., 2012a] of the wave field yet.

The Damage Detection Dataset
The dataset is generated by running the numerical simulation repeatedly for randomly generated plates with or without a crack.In this study, the size of all plates is set to 0.01(m) × 0.01(m) and the lower-left plate corner is defined as the Cartesian coordinate system origin.The wave field is generated by an initial Dirac impulse (in 1 times step span).The simulation runs for 2000 samples with time steps by 1e − 9 seconds.With the excitation of the wave field, the recording at all receiver points starts as the base of the deep detection network.During the method development, possible negative effects by larger displacement values on deep learning model are analysed.The plate, receivers and excitation points are shown in Figure 1.The resulted displacement wave field consists of time histories in the X-and Y-direction at 81 receiver positions with 2000 time steps.
To validate the damage detection method with randomly generated cracks, the crack itself is described by 3 parameters, i.e., crack length l, orientation α, and start position (x, y), with their values being randomly chosen from the following description, l ∈ (0, 1 2 min(e x , e y )], ( 18) where e x and e y are the length of the sample edges along the x-axis and y-axis, s x and s y are the distance between two receivers in the X-axis and Y-axis (see Figure 1).If one randomly generated crack stretches out of the sample plate, the excess part is discarded.The plate particles that correspond to the crack are marked as removed for the Lattice-Element model calculation.
To start the identification scenario, i.e., detecting the damage in a given plate, a binary image of the damage was provided as its label for comparison between the identified structure and original structure.The binary image covers the plate's surface and indicates the location where a crack exists.The label image can be obtained originally of an 100x100 resolution, where each pixel covers an area similar to the size of a particle.When the model is adjusted to refine or enlarge predictions, the resolution of the label image can be changed accordingly.Figure 2 gives out two label images of different resolutions for the same plate.The image of 16x16 pixels is resized and binarized from the image of 100x100 pixels.We use the image of the reduced resolution (16x16) as the supervision signal for training the detector network.As the proposed model does predictions for each pixel, the label image of 16x16 resolution restricts the problem scale while still maintaining the model's applicability.
Figure 1: The receiver arrangement on the surface of a plate.
The numerical simulation randomly generates plate particles and cracks within the plate domain using pre-defined boundary conditions.These randomnesses in sample generation and crack generation are called Type-N samples.Additionally, plates without any crack inside are also generated randomly as reference samples, marked as Type-R.samples, and 8 Type-S samples.The Type-C samples are generated from 10 different samples and 16 different cracks for each sample (see Figure 20 in the Appendix).Among all test samples, we intentionally generated 7 random samples, and each one has its counter case in the training dataset in terms of the same crack (no-crack cases are excluded in this case).It is worth emphasizing that these samples are not repeated ones.Because of the randomness in the generation process, the diversity of the interior particles and their wave field patterns is ensured.

Crack Detection Models with CNN Detector
The  Let N denote the number of receivers that are placed on the plate, they record displacements for T time steps after a load excitation is added.The reading of the i th receiver at time t is denoted as s The surface of the plate is decomposed into regular grids.Each cell within the grid covers a small area of the plate surface.The number of cells indicates the spatial resolution of the prediction model.The size of the cell was chosen as 10 times smaller than the wave lengths.Finer resolution requires a larger number of cells and each cell covers a smaller area on the sample surface.In contrast, a coarser resolution results in less number of cells and larger coverage for a single cell.The resulting cells can be denoted by the column and row index as c i,j .The model makes binary classification on every grid cell to decide whether damage exists in the cell.To summarize, the model can be written as the following equation, where f presents the proposed model and θ are all trainable parameters; p i, j is the probability of damage existence in c i, j .
CNN is a feed-forward network that consists of trainable multistage feature extractors.The feature extractors are trained in an optimization procedure, where the gradient of an objective function with respect to the weights of the feature extractors is calculated and back-propagated [Rumelhart et al., 1986].
CNN is particularly useful to analyse natural signals which can be represented in arrays of different dimension modalities, sequential signals including language and audio as 1D arrays; images as 2D; and video and volumetric images as 3D.The core operation of CNN is to calculate the convolution of input signals with a set of trainable filters.The transformation produces different kinds of features from input signals in spatial (e.g., images) or temporal (e.g., audio) domain [LeCun et al., 1995].
In the implementation, CNN differentiates itself from other ANNs by using the local connection and weight sharing strategy.In CNNs, one "neuron" connects locally with only a restricted number of "neurons" in its previous layers, and the connection weights are shared among all the "neurons".The ReLU function results in zero when the input is less than zero and keeps the input unchanged when the input is above or equal to zero.The LeakyReLU function "squeezes" the value when the input is less than zero and thus allows a small, non-zero gradient when the unit is not active.
The configurations of the two ConvLayers in the same block are identical while the kernel size and number of filters for ConvLayers vary between blocks.To reduce the size of the extracted wave patterns, the output of each 1D-ConvBlock is passed through a MaxPooling layer.Because the 1D-ConvLayers operate only on the time dimension, the WP-Extractor selects only the receiver patterns.For each case, the input data has shape in N × T, where N represents the number of receivers and T the time steps.The output of the WP-Extractor has shape N × T × C, where C is the filter number of the last ConvBlock.
The first fusion layer transforms the receiver data into a 1D vector by passing through a fully convolutional layer.The convolution kernel of this layer is 1 × T, so the transformed field of the ConvLayer covers the whole time domain.The resulted data transformations are given by an N × 1 × C array.After the transformed data are reshaped according to the position of the receivers and has the size of an 9 × 9 × C array (N = 9 × 9 as shown in Figure 1).The second fusion layer, a 2D fully ConvLayer is used to save the information of all receivers.The 2D ConvLayer employs a kernel of 9 × 9 and thus produces an 1 × 1 × C array.

The Loss Function
Training the proposed model is an optimization procedure, and relies on the objective function (also known as loss function).In this work, the proposed model makes a prediction on crack existence for each single patch.In other words, the model makes multiple binary predictions.For single binary classification problems, cross entropy (CE) loss (see Equation 23) is probably the most commonly used loss function.However, considering the "has crack" patches actually consist of only a small portion of total patches, cross entropy (CE) loss can introduce bias towards "no crack" predictions, i.e., simply predicting all patches as "no crack" already result a rather low loss value.To tackle such extreme class imbalance, we select Focal Loss (FL) as our loss function.FL was originally proposed to address the extreme class imbalance in object detection [Lin et al., 2017].FL is a variation of CE loss by adding a penalty term to reduce the loss value of already correctly predicted training cases.The penalty term (1 − p t ) γ (γ ≥ 0) re-weights between difficult and easy examples.During training, if a sample is already predicted correctly with a high probability, it is called an "easy" case.The penalty term reduces its loss and thus focus on "hard" cases, where correct predictions are made with a much lower probability.
where y is the label (1 and 0).
To adjust the loss values of the two binary classes, a weighting factor α ∈ [0, 1] can be added.Similar to defining p t , α t can be defined as α for class 1, and 1 − α for class 0. The focal loss is written as: For the crack detection case, let y denote the binary image and ŷ = p i, j a predicted image.The average FL on N cases is calculated by: where U, V are the number of columns and rows of a grid.
Two hyper-parameters are introduced by Focal loss, α and γ.When γ = 0, FL is equivalent to weighted CE.When γ increases, the modulating factor uses a lower standard to define easy examples and has greater power to down-weights well-classified examples.In practice, the optimal α and γ are found out by empirical studies.

Training Process
Data Pre-processing In this simulation, the recorded data at the edge receivers are discarded to avoid any possible effects caused by the extremely large values.Thus, in total the records at 81 receivers are used for both training and testing.Then the wave displacements are normalized between -1 and 1 according to each sample's maximum and minimum value.The resulting input data for the CNN model is 2000 × 81 × 2 matrix for each case.

Training Configurations
The Adam optimization algorithm [Kingma and Ba, 2015] is chosen as the optimizer; it's a commonly used first-order gradient-based optimization algorithm for training deep networks.When updating model parameters during each training step, the algorithm adjusts the learning strength according to the previous gradients.An overview of the optimizers in deep learning can be found in [Ruder, 2016].The initial learning rate is set to 0.0002.The training epochs were set to 150 for all experiments to ensure sufficient training steps for the models to converge.The best model with respect to the evaluation metric is saved for evaluation.
As mentioned in the previous section, the focal loss has two hyper-parameters, α and γ.To determine suitable hyper-parameters, a set of models were trained with different alpha and gamma.The detailed selection of alpha and gamma is listed in section 3. ) and the IoU-based accuracy for model evaluation.Although the loss values indicate the quality of the prediction on the patch basis, predicting cracks is more focused.We expect the model to find out the "has crack" patches cover as much damaged patches as possible, and as few "no crack" patches as possible.
In the damage detection study, if the model correctly predicts a cell has (has no) damage, the result is marked as TP (PN); if the model makes wrong predictions on damage existence, the result is either FP or FN as shown in Table 1.Based on this, we can calculate precision (TP/(TP + FP)) and the recall (TP/(TP + FN)), then the DSC metric can be calculated with Equation 26.Similarly, the IoU calculates the ratio between the intersection and the union of ground truth damage area (TDA) and predicted damage area (PDA).As illustrated in Figure 6, TDA is bound by dashed red lines; PDA is marked by solid red lines.Their intersection is the TP set, where the model makes correct predictions.The remaining part of PDA is wrongly identified as damaged area, i.e., the FP set; the remaining part of TDA that has damage inside but has not been found forms the FN set.For a single case, the IoU can be calculated as Equation 27by counting the number of each set.
Both DSC and IoU metric range between 0 and 1.If there is no overlap between PDA and TDA, both metrics are equal to 0. When PDA is closer to TDA, the intersection area becomes larger and the union area becomes smaller, resulting in a value closer to 1.When PDA covers TDA exactly, DSC and IoU metrics reach their upper limit of 1.When the sample has no damage and the model makes correct predictions, both the intersection and the union become 0. In this special case, both DSC and IoU metrics are assigned the value 1.One underlying assumption of data generation is that each sample has maximum "one crack" inside.Based on the assumption, we can define the accuracy using IoU values.For the prediction of a single case, we can consider it as a "correct" prediction if its IoU is greater than a given threshold.Given the threshold, the accuracy on the whole dataset is calculated as the ratio of the number of samples whose IoU value is greater than the threshold, to the total number of evaluated dataset.

Simulated Displacement Wave Field Using Dynamic Lattice
The dynamic lattice method described in section 2.1 is used to simulate the wave fields in a 2D plate.The considered boundary conditions with different excitation points and crack conditions are shown in Fig. 7. Fig. 8 shows the simulated wave fields in lateral direction for a boundary condition of Fig. 7a.The simulated wave fields with a generated crack (Fig. 7b) are shown in Fig. 9.For the conditions shown in Fig. 7c -excitation point in upper middle boundary -the wave field is plotted Fig. 10.The results clearly show wave shadows behind the crack as well as the reflection of the wave field from the defined cracked surface.rectangular impulse load with a magnitude of 1 kN is kept for 10 time steps, where ∆t = 0.00000001s.The Young's modulus of a plate is assigned to 5 GPa.  Figure 13 shows the FL value with different γ and α values.Figure 13 (A) is a remake of Figure 1 in [Lin et al., 2017].With use of the penalty term, the loss value is reduced with the probability of making correct predictions increasing.γ controls the decay strength, and larger γ ensures the loss to decrease faster.For example, when γ = 5, the predictions where p t > 0.4 can hardly contribute to the loss.In contrast, when γ = 1, the predictions where p t > 0.6 still contribute to the loss.Meanwhile α can also be used to re-weight the binary classes (has crack and has no crack) (Figure 13 (B)-(D)).When α is used for one class, the other class is re-weighted by (1 − α).Choosing a small α for a class will obviously decrease the contribution of the whole class to the loss.For example, if α = 0.1 is chosen for the "has crack" class and γ = 5, the predictions can be hardly improved when it is greater than 0.3.Particularly, when γ = 0 and α = 0.5, the FL is equivalent to the cross-entropy (CE) loss.In this paper, γ and α control the model's learning strength on "no crack" class and "has crack" class.As α increases, the model is driven to focus on damaged cells, because the false predictions of damaged cells contributes more to the overall loss.As γ increases, the model is trained to focus on "hard" cells, where the model can't make predictions with high confidence.Because a higher γ value forces the model to pay more attention to the "hard" cells.The evaluations of models that are trained with different αs are given in Table 3.2.1.When assigning larger weights(larger α) to the "has crack" class for CE loss, the trained model tends to have lower precision and higher recall.This can be interpreted as the model's tendency to give more "has crack" predictions.On the contrary, using small weights(smaller α) for the "has crack" class results in higher precision but lower recall.This means the models tend to give less "has crack" predictions.When α = 0.9, the CE loss gives the model of the highest accuracy, however, the model is also characterised as having low precision and high recall; having high DSC metric value but not the optimal one.The results using focal loss are shown in Table 3.2.1.By adding the penalty term, with carefully chosen γ, the trained models have balanced the precision and recall, and thus result in an increasing in IoU and DSC metrics.The accuracy is also improved compared to the models trained with CE loss.We consider two sets of α and γ combinations, α = 0.35 γ = 0.2 and α = 0.9 γ = 0.4, have balanced precision and recall, and achieve high DSC and accuracy at the same time.

The Selected Thresholds
The accuracy is calculated dependently with two threshold settings: the threshold of crack existence in a pixel, and the threshold of correct predictions of a sample.The first threshold defines the probability value, above which a pixel can be considered to have a crack inside.In this work, it is also referred to as binarizing threshold (T bin ).The second threshold is set as a "tolerance" (marked as T tol ) to the prediction.The "tolerance" allows a prediction to be "correct" when the predicted "has crack" pixels cover a certain area of the crack, i.e., its IoU score is greater than the threshold.The very strict criteria requires that the predicted "has crack" patches cover the true crack-existing area, i.e., the IoU = 1, to be a "correct prediction".
The FL function pushes the predicted probabilities of "has crack" and "no crack" towards opposite extremes, because a sample with IoU value that is close to 0.5 will have a large penalty during training.This fact is also illustrated in Figure 14 and 15, which are resulted from the recommended model trained with α = 0.9 γ = 0.4 and α = 0.35 γ = 0.2.The sub-figures A of both two figures suggest that most damaged cells are correctly predicted with a probability above 0.5, while stillminor "hard cases" get a border prediction around 0.5, with about 45 cases that both models can not properly handle.In both sub-figures B, the accumulated histograms show a clearer comparison on the quality of predictions for different T bin values.They show that different T bin produce similar accumulative histogram curves.This suggests that most "no crack" cells and many "no crack" cells are predicted with very high confidences.We can choose T bin = 0.5 as it also fits the configuration of FL loss.The curves begin to rise when IoU value reaches 0.5.This suggest us to chose T tol for evaluation, so that the number of cases with IoU values between 0 to 0.5 are relatively small and accumulate quickly when value IoU > 0.5.

Discussion on Model Performance
The histograms on the distribution of test data (Figure 14 and 15) indicate that both models are not good at detecting a minor set of damaged cases in test data.We first examine the distribution of crack size (Crack size is the pixel count or the percentage of "has crack" pixels in the 100x100 labeling image.) in training data and test data.As shown in Figure 16, the samples with small cracks consist of a larger portion in test data then in training data.To further explore the relation between crack size and model performance, we plot the histogram of crack size against IoU values for test data (Figure17).We can easily find out that the IoU values can be very low for tiny crack samples, whereas the IoU values for larger crack samples are mostly above 0.5.Then the accuracy, adjusted by excluding samples with small crack size from test data, is shown in Figure 18.It shows that the proposed model is particularly good at identifying larger cracks.IoU values are calculated from the model predictions with T bin = 0.5.It indicates that the most low-quality predictions are made for samples with tiny cracks, while cases of larger crack sizes generally have better predictions.This means the developed model can easily distinguish between damaged cases and non-damage cases for large cracks but is not good at detecting tiny cracks.If we only count the cases with crack size greater than 0.002, the accuracy leaps by around 0.1.When excluding the cases with small cracks (crack size less than 0.003), the accuracy of the proposed model is already beyond 0.9.If we only count the cases with larger cracks (crack size greater than 0.004), the accuracy of both models can reach 0.95.

Conclusion
The paper presents a new approach to detect damages by wave pattern recognition models.The major development is a learning CNN to detect on-hand the visible wave pattern of the damaged zone within a solid structure.To generate the cracked structure, a new dynamic Lattice Element method was used.The major advantage of this method is the application to heterogeneous structures under mechanical, hydraulically, thermal field influence and local chemical changes to describe the evolution of damages in solid structures.The use of new generation deep CNNs to analyse the time dependency within the changed wave pattern is promising.With the described method, a stable detection of 90 percent of the generated large cracks was possible.The next steps will be the reduction of the used number of receivers and increasing the model's ability of tiny crack detection .
Figure 19: The category for 320 test cases.The test cases are categorised into four types: 1).randomly generated samples with randomly generated cracks (Type-N), 2).randomly generated samples with no crack (Type-R), 3).randomly generated with similar cracks (Type-S), and 4). the same sample with different cracks (Type-C).They are marked by the colored marks.The special cases (marked as "SC" in yellow) are the 7 cases we intentionally generated with the same cracks in training data but from different samples.
Training Data Set -The Wave Propagation Data The wave field data was produced by numerical simulation on virtual 2D plates as study examples.After the application of a Dirac input, the elastic wave propagates from the source point into the plate.Adding source points at a different location of the plate generates different wave patterns, which are recorded by a set of receiver points.Particularly when the plate has a crack inside, the wave patterns are very different from the one in non-crack plates.Because of the crack, the wave field shadow as well as the reflection becomes visible and change the pattern of the wave field around the damaged region.
Figure 2: Plate with crack and its binarised labels at different resolutions.(A).The generated sample with one crack inside in particle view.(B).The 100x100 binary label image for the plate (white pixels indicate the existence of the crack).(C).The 16x16 binary label image for the plate.
proposed damage detection model is trained to detect the exciting crack in the 2D plate and the damage location on hand of the wave field pattern.The training covers a series of wave fields in randomly generated plates with or without cracks.The receivers are placed inside the plate and along the free boundaries to record the coordinate-depending wave field time histories.Here, at the receiver points, the displacements in x-and y-directions are recorded.These typical 1D Euclidean time series are the data base for the 1D convolution filters within the feature extractors.The proposed deep network model consists of three components: a set of 1D-CNN layers acting as a wave pattern (WP) extractor to select WPs from input displacement time histories; two fully convolutional layers to handle the time dimension and on the receiver's dimension to fuse wave patterns; and a predictor module taking the fused features and making predictions of crack existence (Fig 3).

Figure 3 :
Figure 3: Conceptual explanation of the proposed crack detection model with 1D-CNN detector.(A).Diagram of a 1D-CNN that transforms a discretized wave input and produces a discrete feature sequence.(B).Internal structure of a 1D-CNN layer, consisting of a trainable weight W (h) , a bias b (h) .Activation function is represented by σ.(C) Diagram of the complete structure of the proposed model.
Local connectivity, weight sharing, and pooling, are CNN's three key properties for dealing with natural signal[LeCun et al., 2015].A typical CNN architecture is composed of firstly some convolutional layers (ConvLayer), and then some more ConvLayer, or fully connected layers.ConvLayer detects local features from the previous layer by transforming the input signal with a set of filters.It produces different kinds of feature maps with respect to its filters, then an activation operation is applied to the feature maps.The non-linear activation functions "squeeze" the values of a feature maps into a certain range, mostly [0, 1], [−0.5, 0.5] or [0, +∞).Sometimes, a pooling layer is used for down-sampling the feature maps by taking local average or local maximum values.The pooling layer merges semantically similar features into a higher level[LeCun et al., 2015].The Pooling layer can be intentionally replaced by setting a larger stride in the convolution layer[Springenberg et al., 2015].

Figure 4
Figure4depicts a schematic drawing of applying n 1d-convolution to N sensory input of T steps.The kernel size is denoted as m.For each receiver's input, the convolution produces n features.The red mark indicates the data patch used for convolution and the corresponding results in feature maps.

Figure 4 :
Figure 4: The schematic drawing of 1D convolution on N receiver data in T steps.
The core module of the predictor is composed of two Transpose-Convolutional layers (TransCon-vLayer, sometimes called as deconvolution).The two TransConvLayers are used to up-sample the saved transformed data.TransConvLayers are widely used in image generation tasks, such as DCGAN ([Radford et al., 2016]).The saved transformed data are first reshaped to 4 × 4 × c, and then passed through the two TransConvLayers.The ready-for-predict data shape is 16 × 16 × 4. Finally, it's passed through a 2D ConvLayer of 1 × 1 kernels, and a single output channel.The sigmoid function is used in this ConvLayer as activation to make sure the prediction ranges from 0 to 1.The layer configuration and wave pattern shapes are also shown in Figure5, K refers to the kernel size, S is the steps, F indicates the number of filters.The step of Max-Pooling is set to 4. The 1D convolution can be implemented using 2D convolution by fixing the kernel size of the receiver dimension to 1.

Figure 5 :
Figure 5: Detailed design of the damage detection model.(A).The network architecture.(B) Shapes of input and output of each component.

Figure 7 :
Figure 7: Boundary conditions: (a) horizontal excitation without generated crack, (b) horizontal excitation with generated crack, and (c) vertical excitation with generated crack

Figure 10 :
Figure 10: The 6 frames (100 time steps interval, from left to right) of a displacement (u y ) wave propagation inside the defined plate in Fig.7c.

Fig. 12
Fig.12 clearly shows the arrival time of the wave fields at each reference point.The closest reference

Figure 11 :
Figure 11: The boundary conditions and assigned reference points: (a) without crack, (b) with a generated crack

Figure 16 :
Figure 16: Histogram of crack size distribution in training data and testing data.A: crack size distribution in training data; B: crack size distribution in testing data.

Figure 17 :
Figure 17: Histogram of crack size distribution and IoU values for test data.A: results from model trained with α = 0.9 γ = 0.4; B: results from model trained with α = 0.35 γ = 0.2.

Figure 18 :
Figure18: The adjusted accuracy calculated for test data after excluding tiny crack cases.A: predictions are made by the model with α = 0.9 γ = 0.4; B: predictions are made by the model with α = 0.35 γ = 0.2.The line plot presents the accuracy that is re-calculated when excluding cases with crack size less than 0.001, 0.002, 0.003, and 0.004; the bar chat shows the number of samples after excluding the samples with tiny cracks against the total number of samples in test dataset.

Figure 20 :
Figure 20: The true crack occurrence in reduced resolution (100 x 100) for 320 testing cases.

Figure 21 :
Figure 21: The true crack occurrence in reduced resolution (16 x 16) for 320 testing cases.

Figure 22 :Figure 23 :
Figure22: The predicted probability of crack existence in pixels for 320 testing cases, the brighter a pixel's color is indicates the higher probability of crack existence inside the pixel.It is made by the model trained with α = 0.9 and γ = 0.4

Figure 24 :Figure 25 :
Figure 24: The predicted probability of crack existence in pixels for 320 testing cases, the brighter a pixel's color is indicates the higher probability of crack existence inside the pixel.It is made by the model trained with α = 0.35 and γ = 0.2

Table 1 :
Prediction typology in a binary-classification-based damage detection.The model and simulation code is implemented in Python with Tensorflow 2.1 Keras.The simulations are performed on a workstation of Windows 10 platform with Nvidia GPU.

Table 2 :
The evaluation results (including IoU, DSC, and accuracy) of models trained by varying α for CE loss(γ= 0).

Table 3 :
The evaluation results (including IoU, DSC, and accuracy) of models trained by varying γ for FL loss (optimal α).