Few−shot learning of frame structure damage detection based on Meta−Learning and DCCMN model

doi:10.21203/rs.3.rs-3321893/v1

Download PDF

Research Article

Few−shot learning of frame structure damage detection based on Meta−Learning and DCCMN model

https://doi.org/10.21203/rs.3.rs-3321893/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In response to the poor detection performance of frame structure under limited data conditions, this paper proposes a novel approach. This approach is based on the concepts of dynamic convolution and models such as ResNet and ShuffleNet. It introduces a Cross − Mix module and builds upon it to formulate the DCCMN (Dynamic Convolution Cross − Mix Network) model. Meta − Learning and the DCCMN model are applied to detect frame structure damage under few − shot scenarios (Meta + DCCMN). Experiments are conducted on the floor frame structure of Columbia University to validate the effectiveness of this approach. The proposed approach is subjected to N − way K − shot experiments and compared under the same conditions with SVM, ResNet − 18, DCCMN, and Meta + ResNet − 18 models. Experimental results demonstrate that, in the case of few − shot learning, the accuracy of this approach can reach 100% in 2 − way 5 − shot and 9 − way 10 − shot scenarios. Furthermore, the proposed damage detection approach outperforms other models, effectively addressing frame structure detection challenges under limited data conditions.

Damage detection

frame structure

Meta&thinsp

&minus

&thinsp

Learning

few&thinsp

&minus

&thinsp

shot

DCCMN model

Engineering structures are susceptible to influences from human activities and the environment, including progressive deterioration caused by factors such as corrosion and fatigue and sudden failures due to earthquakes and fires. These long − term and short − term damages can reduce the structural lifespan, undermining reliability and safety. As a result, monitoring and detection of structural conditions have become imperative [1 − 5].

Conventional methods for structural damage detection heavily rely on visual inspection and empirical approaches. Visual inspection requires skilled personnel with experience to assess issues such as fractures, corrosion, and structural cracks. However, the inspection process can be time − consuming and labor − intensive for larger engineering structures, potentially making damage challenging to perceive. While empirical rules can provide some guidance, they might not be universally applicable. With the advancement in the fields of Structural Health Monitoring (SHM) and Structural Damage Detection (SDD), an increasing number of technologies are being applied for detecting, locating, and evaluating structural damage [6 − 9]. Simultaneously, progress in sensor technology, encompassing wired and wireless, contact and non − contact approaches, has ushered in breakthroughs in data − driven structural damage detection [10, 11].

In mechanical engineering, vibration − based rotating machinery damage detection systems have become relatively sophisticated, evaluating the condition of mechanical components by recording displacement, velocity, and acceleration of vibration responses [12 − 14]. Researchers have begun extending these methods to damage detection in civil structure facilities. Goi et al. proposed a damage index derived automatically from a multivariate autoregressive model estimated from bridge ambient vibration. These indices detect damage by assessing random distances between healthy bridge data and unknown test data. Additionally, they employ statistical hypothesis testing based on the probability distribution of damage indices for further damage detection, and this method is evaluated in on − site testing of steel truss bridges [15]. Cheung et al. analyzed data from the Swiss Z24 bridge using autoregressive algorithms for damage detection, localization, and quantification. Experimental results indicate that current diagnostic methods can consistently detect damage but struggle with accurate localization and quantification [16]. Li et al. employed a combination of Empirical Mode Decomposition (EMD) and wavelet analysis to detect changes in structural response data. They decompose structural vibration response signals into multiple single − component signals using the EMD technique and then obtain analytic signals through the Hilbert transform. Subsequently, wavelet transform is applied to each analytic signal, enabling accurate detection of damage location and severity. Numerical simulations and analysis of shear − building response signal data demonstrate the effectiveness of this approach [17].

However, due to the complexity of civil structures, these methods exhibit limitations in capturing structural features. Factors such as the shape, materials, and boundary conditions of civil structures may render traditional feature extraction methods inadequate in fully describing structural states, leading to challenges in accurate damage localization and quantification.

With the rapid advancement of machine learning and deep learning technologies, machine learning − based structural damage detection methods have gradually emerged as a highly regarded field. These methods utilize models such as artificial neural networks [18,19], support vector machines [20], convolutional neural networks [21 − 24], and others to construct end − to − end systems that no longer require human intervention. The multi − layered structure of deep neural networks can automatically learn high − level feature representations from data, making them more adept at handling complex problems. In contrast to manually selected features in traditional methods, deep learning can more accurately capture crucial features within the data, thus enhancing the accuracy of damage detection.

Betti et al. proposed a structural damage recognition method based on artificial neural networks and genetic algorithms. This method evaluates the modal characteristics of steel frames at various damage levels, such as natural frequencies and mode shapes. Subsequently, damage detection is carried out using genetic algorithms by defining two error functions that measure the discrepancies between measurement test results and finite element model calculations of the steel frame [25]. Padil et al. introduced a combination of non − probabilistic methods and principal component analysis to address uncertainties in neural network − based damage detection and the inefficiency of using Frequency Response Function (FRF) data [26]. Hakim et al. employed artificial neural networks to predict the extent and location of dual − point damage based on empirical dynamic performance data from an I − beam structure. The neural network was trained using vibrational data of natural frequencies and mode shapes derived from experimental modal analysis and finite element simulations of intact and damaged I − beam steel structures [27]. Meanwhile, Pathirage et al. presented a deep sparse autoencoder framework for structural damage identification. This framework helps tackle pattern recognition challenges with highly nonlinear characteristics, such as learning the mapping between vibration features and structural damage. They accounted for noise effects in measurement data and uncertainties in finite element modeling, conducting numerical investigations on steel frame structures and experimental studies on pre − stressed concrete bridges [28].

These deep learning − based approaches offer new possibilities and efficiencies for structural damage detection, especially in addressing complex nonlinear damage scenarios. While deep learning demonstrates numerous advantages in structural damage detection, it also encounters challenges. For instance, deep learning methods often require many labeled samples for training. Yet, in civil structural damage detection, labeled samples are often limited, potentially leading to poor generalization of new samples. Furthermore, the complexity of deep learning models may demand increased computational resources and time.

This study proposes a novel approach combining Meta − Learning [29] with Convolutional Neural Network (CNN) for structural damage detection with few − shot samples to address the abovementioned issues. The core idea of Meta − Learning is to train a model on multiple tasks, enabling it to rapidly adapt to new tasks and perform well in limited data samples. CNN possesses powerful feature extraction capabilities, automatically learning hierarchical feature representations from data. This study further integrates the concept of dynamic convolution into traditional CNN. It combines with the notion of Cross − Mix to construct the Dynamic Convolution Cross − Mix Network (DCCMN) model. Dynamic convolution enables adaptive adjustment of convolutional parameters, while Cross − Mix combines features from different parts of the input, facilitating information exchange and potential feature enhancement. The combination of Meta − Learning and the DCCMN model (Meta + DCCMN) maximizes the utilization of few − shot samples, automatically learning feature representations of frame structure and better adapting to new damage detection tasks. In summary, this research aims to address the challenge of data scarcity in structural damage detection, enhance accuracy and generalization capability in few − shot damage detection, and offer a more reliable and efficient solution for structural health monitoring in the engineering field.

The remaining sections of this paper are as follows: Section 2 introduces the Meta − Learning method, Section 3 presents the neural network model, Section 4 outlines the damage detection process, Section 5 describes the experimental content, and Section 6 concludes the paper.

Meta − Learning aims to enhance learning performance in new tasks by leveraging knowledge or experience acquired from multiple tasks. The core idea of Meta − Learning is to treat the learning algorithm as the learning subject, aiming to develop the ability to learn how to learn. Meta − Learning algorithms can capture commonalities and patterns across tasks, and apply these shared characteristics to new tasks, enabling the model to quickly learn and adapt from limited samples, especially excelling in few − shot scenarios.

Meta − Learning data comprises training and testing tasks, each consisting of support and query sets, as illustrated in Fig. 1. In training tasks, the support sets are employed for meta − model parameter updates and optimization in each training task, and the query sets assess the meta − model's performance on the current task, facilitating gradient calculation and loss function evaluation. In the testing task, the support set is used to fine − tune the parameters of the meta − model for the new task, while the query set evaluates the meta − model's performance.

The training process of Meta − Learning consists of inner loops and outer loops. In the inner loop, select a training task from the training tasks and use the support set of this task to train the parameters of the meta − model. Through the inner loop, the meta − model can swiftly update its weights to adapt to the characteristics of different tasks. In the outer loop, first, select a training task from the training tasks and use the support set of this task to train the meta − model. Upon completing the training, evaluate the meta − model's performance using this task's query set, and the corresponding loss function is computed. The outer loop can be repeated multiple times based on a predetermined number of iterations or specific stopping criteria. It aims to update and evaluate the meta − model among various training tasks, enhancing its generalization capability to perform well on unseen tasks. After completing the outer loop, the gradient can be updated, and then the meta − model's parameters can be adjusted and optimized.

Through the alternating training of inner and outer loops, Meta − Learning endows the model with stronger generalization ability and adaptability. It enables the model to rapidly learn and adapt to new tasks, effectively supporting rapid learning and decision − making in practical application scenarios.

3.1 Dynamic Convolution

In recent years, methods involving dynamic convolutions, such as CondConv [30], DynamicConv [31], and DyNet [32], have garnered significant attention from researchers. These methods render convolutional designs more flexible and adaptive without increasing network depth or width. The fundamental concept behind dynamic convolutions is to adjust convolutional parameters adaptively based on input variations.

The dynamic convolution method proposed by DyNet is adopted in this paper, as depicted in Fig. 2. DyNet enables the model to dynamically learn the importance between different channels and weight features from various channels, thereby extracting more informative features. This approach effectively reduces the model's parameter count. Compared to traditional fixed convolution kernels, dynamic convolution kernels have fewer parameters, conserving memory and computational resources. Additionally, this approach enhances the model's utilization of diverse channel features, improving its ability to model complex data.

Dynamic convolution kernels are generated based on input data and predicted coefficients. As a result, the model can adaptively adjust convolution kernels for different input samples. This further reinforces the model's generalization capability and adaptability, enabling it to accommodate new tasks better when handling different data types. The formula for employing dynamic convolutions is as follows:

$${\tilde {O}_t}={\tilde {w}_t} \otimes x=\sum\limits_{{i=1}}^{{{g_t}}} {(\eta _{t}^{i}\cdot w_{t}^{i})} \otimes x=\sum\limits_{{i=1}}^{{{g_t}}} {(\eta _{t}^{i}\cdot w_{t}^{i} \otimes x)=} \sum\limits_{{i=1}}^{{{g_t}}} {(\eta _{t}^{i}\cdot (w_{t}^{i} \otimes x))} =\sum\limits_{{i=1}}^{{{g_t}}} {(\eta _{t}^{i}\cdot O_{t}^{i})}$$

Where ${\tilde {O}_t}$represents the output of the dynamic convolution kernel,${\tilde {w}_t}$ represents the dynamic convolution kernel, x stands for input, g_t represents the number of groups, $\eta _{t}^{i}$ represents coefficients, $w_{t}^{i}$ represents the fixed convolution kernel, and $O_{t}^{i}$ represents the output of the fixed convolution kernel.

3.2 Cross − Mix Module

In ResNet [33], the primary idea is to add the output of one layer to the input of that layer in the channel dimension by using residual blocks, a process referred to as "skip connection" or "residual connection." This skip connection design enables the network to learn residual functions, thus helping alleviate the gradient vanishing problem and facilitating the training of intense networks. On the other hand, ShuffleNet [34] is another network architecture that achieves feature communication and fusion by grouping and shuffling channels. ShuffleNet can extract richer and more diverse feature information, further enhancing the model's performance and generalization ability.

This paper proposes the Cross − Mix module based on the concepts of the models above, primarily composed of three parts: Split, Cross, and Mix, as illustrated in Fig. 3.

The process can be understood as follows: Initially, the input is split into two parts equally along the length dimension (Split1 and Split2). Assuming the input is $x=[{x_1},{x_2}, \cdots ,{x_n}]$, the formula for the Split part is as follows:

$${\text{Split 1}}=[{x_1},{x_2}, \cdots ,{x_{n/2}}]$$

$${\text{Split 2}}=[{x_{n/2+1}},{x_{n/2+2}}, \cdots ,{x_n}]$$

In the Cross part, dynamic convolution and batch normalization are applied to the split parts, resulting in two new feature maps (DyConv1 and DyConv2). Suppose the channel dimensions of the original split parts (Split1 and Split2) and the DyConv feature maps differ. In that case, 1×1 convolutions are employed to adjust the channel dimensions of the split parts, aligning them with the corresponding DyConv feature map dimensions. Assuming the number of channels for Split1 is C₁ and DyConv1 is C₂, the 1×1 convolution is adjusted for Split1 so that C₁ = C₂.

Assuming the input is $x=[{x_1},{x_2}, \cdots ,{x_n}]$, the batch normalization operation is used on it, the formula for the BN layer is as follows:

$$\mu =\frac{1}{n}\mathop \sum \nolimits_{{i=1}}^{n} {x_i}$$

$${\sigma ^2}=\frac{1}{n}\mathop \sum \nolimits_{{i=1}}^{n} {({x_i} - \mu )^2}$$

$${\hat {x}_i}=\frac{{{x_i} - \mu }}{{\sqrt {{\sigma ^2}+ε } }}$$

$${y_i}=\gamma {\hat {x}_i}+\beta$$

Where µ represents the mean of the input, ${\sigma ^2}$represents the variance of the input, ${\hat {x}_i}$represents the normalized output, and ε is a constant to avoid the denominator being zero. y_i represents the final output, which restores the range and distribution of the data using the normalized output multiplied by the scaling parameter γ and adding the offset parameter β.

Suppose the input tensor is X and the dimensions are (H, W, C_in), where H and W are the spatial dimensions, respectively, and C_in is the number of input channels. Assuming that the output tensor is Y and the dimension is (H, W, C_out), where C_out is the number of output channels, the formula for using a 1 × 1 convolution process is as follows:

$${Y_{ijc}}=\sum\limits_{{k=1}}^{{{C_{in}}}} {{X_{ijk}}\cdot } {W_{kc}}+{b_c}$$

Where Y_ijc represents the output tensor at (i, j, c), X_ijk represents the input tensor at (i, j, k), W_kc represents the weight of the convolution kernel, and b_c is the bias.

Two approaches, A and B, are incorporated in the Mix part, which theoretically yields identical outcomes. In Approach A, DyConv1 is added to Split2, yielding the mixed feature map Mix1. Split1 is added to DyConv2, resulting in the mixed feature map Mix2. Finally, Mix1 and Mix2 are concatenated along the length dimension to form the ultimate output feature map. The formula for approach A is as follows:

$${\text{Mix1 = }}Add{\text{(DyConv1, Split2) = DyConv1 + Split2}}$$

$${\text{Mix2 = }}Add{\text{(Split1, DyConv2) = Split1 + DyConv2}}$$

$${\text{Output = }}Concat{\text{(Mix1, Mix2)}}$$

In Approach B, DyConv1 is initially concatenated with Split1, producing the mixed feature map Mix1. Split2 is concatenated with DyConv2, generating the mixed feature map Mix2. Ultimately, add Mix1 and Mix2 to form the output feature map. The formula for approach B is as follows:

$${\text{Mix1 = }}Concat{\text{(DyConv1, Split1)}}$$

$${\text{Mix2 = }}Concat{\text{(Split2, DyConv2)}}$$

$${\text{Output = }}Add{\text{(Mix1, Mix2) = Mix1 + Mix2}}$$

Through the design of the Cross − Mix module, the network not only focuses on information fusion along the channel dimension but also divides and merges information along the length dimension. It helps in learning more meaningful and diverse representations of vibration signals. Cross − Mix combines features from one input part with another, facilitating information exchange and potential feature enhancement. This enhances the model's feature representation capacity and improves model performance to a certain extent. By employing Cross − Mix, the model effectively increases the number of feature maps, enabling it better to capture complex patterns and relationships within the data, thereby enhancing the model's performance and generalization ability.

3.3 DCCMN Model

Based on the proposed Cross − Mix module in the previous section, the Dynamic Convolution Cross − Mix Network (DCCMN) model is introduced in this paper, as depicted in Fig. 4. Since the Cross − Mix module involves operations such as split, dynamic convolutions, and mix, these processes introduce many parameters and computational complexity. Additionally, directly using the Cross − Mix module in the early layers of the model could lead to information loss.

To address these potential issues and balance model complexity and performance, the DCCMN model employs large convolution kernels and simple operations in its initial stages. For instance, the first convolutional layer has a kernel size of 16 with a stride of 4, and the second convolutional layer has a kernel size of 8 with a stride of 2. Batch Normalization (BN) and max − pooling layers are added after each convolutional layer. This approach aims to facilitate the rapid learning of low − level features from input data, reduce excessive parameters and computation, aid in model convergence, and mitigate the risk of overfitting. Subsequently, the Cross − Mix modules are gradually introduced for feature fusion. The model incorporates four Cross − Mix modules, enhancing performance and training efficiency to capture higher − level feature representations. A global average pooling layer is added after the last module to reduce feature dimensionality, and the model's final output is obtained using the softmax activation function.

The global average pooling layer is calculated as follows:

$${y_c}=\frac{1}{{H \cdot W}}\mathop \sum \nolimits_{{i=1}}^{H} \mathop \sum \nolimits_{{j=1}}^{W} {x_{ijc}}$$

Where y_c is the output of the c − th channel, x_ijc is the input feature map at (i, j, c), and H and W are the height and width of the feature map respectively.

The calculation of softmax is as follows:

$$p(i)=\frac{{{e^{{\delta _i}}}}}{{\mathop \sum \nolimits_{{k=1}}^{K} {e^{{\delta _k}}}}}$$

Where p(i) represents the probability of each output, the sum of all p(i) is 1, and K is the number of classes of the multi − classification problem.

The damage detection process is shown in Fig. 5, which mainly includes three stages: data processing, model training and model testing.

In the data processing stage, vibration signals are collected and stored according to sensor identifiers. Subsequently, the data is normalized to facilitate better processing and analysis in subsequent steps. After data processing, the dataset is partitioned into training and testing tasks.

During the model training phase, select one task randomly from the training tasks, and the support set of this task is used as input to the DCCMN model for training, completing the inner loop. The process of parameter updating in the inner loop is as follows:

$$\theta ^{\prime}=\theta - \alpha * \partial {L_{Trs}}/\partial \theta$$

Where $\theta ^{\prime}$represents the updated parameters of the inner loop, $\theta$represents the initial parameters, $\alpha$represents the learning rate of the inner loop, and ${L_{Trs}}$represents the loss function of the support set in the training task. The loss function adopts the cross − entropy loss function, and its formula is as follows:

$$Loss= - \frac{1}{N}\sum\nolimits_{{i=1}}^{N} {y_{{true}}^{{(i)}}\log (y_{{pred}}^{{(i)}}} )$$

Where N represents the number of samples in the support set, $y_{{true}}^{{(i)}}$ represents the true label of the i − th sample, and $y_{{pred}}^{{(i)}}$ represents the predicted label of the i − th sample, that is, the probability distribution of the model for the i − th sample belonging to each category.

Then, using the query set to evaluate the model, calculate the loss function, and update the model parameters to complete the outer loop. In the process of the outer loop, the model's parameters are further updated by calculating the average loss and gradient of multiple tasks, and the model is saved after reaching the number of loops. The calculation process of parameter update of the outer loop is as follows:

$$\theta ^{\prime\prime}=\theta ^{\prime} - \beta * (\sum {\partial {L_{meta}}} /\partial \theta ^{\prime})/E$$

Where$\theta ^{\prime\prime}$represents the parameters after outer loop update, $\theta ^{\prime}$represents the parameters after inner loop update, β represents the learning rate of the outer loop, and L_meta represents the meta loss. $\sum {\partial {L_{meta}}} /\partial \theta ^{\prime}$represents the gradient accumulation of all training tasks, and E represents the number of outer loops.

During the model testing phase, randomly select a testing task, which may have a few support set samples. First, the saved DCCMN model is fine − tuned using the support set. Then, the model is tested using the query set to obtain damage detection results. The parameter fine − tuning process is as follows:

$$\theta ^{\prime\prime\prime}=\theta ^{\prime\prime} - \gamma * \partial {L_{Tes}}/\partial \theta ^{\prime\prime}$$

Where $\theta ^{\prime\prime\prime}$represents the parameters after fine − tuning in the testing task, $\theta ^{\prime\prime}$represents the saved model parameters, that is, the parameters after the outer loop update, $\gamma$represents the learning rate of fine − tuning, and ${L_{Tes}}$represents the loss function of the support set in the testing task.

The DCCMN model is employed throughout the process for feature learning and damage detection of vibration signals. During training and testing, support and query sets enable effective parameter updates and fine − tuning, adapting to the data characteristics of different tasks, thus achieving accurate damage detection.

The few − shot experimental study consists of several components, including experimental subject and data, damage detection experiment, and model comparison experiment. The deep learning framework used in the experiments is TensorFlow 2.6.1. The computer used is an Intel Core i5 − 7300HQ, and the GPU is an NVIDIA 1050Ti.

5.1 Experimental Subject and Data

In this study, the experimental subject is a floor frame structure constructed by Columbia University [35]. The 3D schematic diagram of the frame structure is shown in Fig. 6(a), which includes a central section with a base dimension of 2.5m × 2.5m and a height of 3.6m. The entire frame structure is divided into four faces: east, south, west, and north, each consisting of beams and columns with the same structural dimensions. The same encoding is used in the diagram to identify components in different orientations, such as "East 1," "North 1," and so on.

Fifteen acceleration sensors were placed on the frame structure for the experiment. Three acceleration sensors were installed at the connection points of each floor, located in the west, center, and east positions. Sensors 1, 2, and 3 were placed near the ground, while the remaining were positioned at the top of each floor. The specific sensor locations are shown in Fig. 6(b). Vibration signal data can be obtained for damage diagnosis and experimental evaluation by placing sensors on the frame structure. The data will be used to train and test the Meta + DCCMN method proposed in our study, and to validate and analyze its performance in structural damage detection, demonstrating the effectiveness and applicability of the Meta + DCCMN method.

The experiment simulated nine damage cases by removing or loosening the diagonal supports and bolt connections labeled 1 − 12 in Fig. 6(a). Significant differences exist between each case, and the operational process is outlined in Table 1.

Table 1

Specific damage cases
Cases	Operation
1	Undamaged (intact)
2	Removal of east face component 1
3	Removal of east face components 1 and 4
4	Removal of east face components 1 − 4
5	Removal of east face components 1 − 8
6	Removal of east face components 1 − 8 and north face components 2 and 6
7	Removal of components 1 − 8 from all four faces
8	Removal of components 1 − 8 from all four faces and loosening of components 9 and 10 on the east face
9	Removal of components 1 − 8 from all four faces and loosening of components 9 − 12 on the east face

Following the procedure outlined in Table 1, the frame was sequentially damaged in the experiment, and a 200Hz impact was applied to the frame to collect vibration signals. Based on data publicly available from Columbia University, 135 sets were obtained from the 15 sensors for nine damage cases. Cases 1 to 5 had a signal length of 24000. Case 6 had 60000, and cases 7 to 9 had 72000.

Due to the varying signal lengths for each case and to enhance the similarity between samples, the experiment truncated the signals of these nine cases to a uniform length of 24000. Subsequently, a sliding window overlapping sampling method was employed to increase the sample number, creating 100 samples for each case.

5.2 Damage Detection Experiment

This experiment selected fifteen sensors within the frame structure for investigation. Because the data of different sensor positions differed, ten sensors were selected as training tasks to balance the characteristics of different position data, and the remaining five were selected as testing tasks. Specifically, the sensors for the testing tasks were numbered 4, 6, 8, 10, and 12, as shown in Fig. 6(b).

In the context of few − shot learning, classification tasks are often framed as N − way K − shot problems, where N represents the number of categories and K represents the number of samples in each category. For example, a 5 − way 3 − shot denotes five categories, and each category contains three samples. The frame structure simulated nine damage cases, with cases 1 and 9 representing an intact state and the most severely damaged state, respectively. A 2 − way K − shot experiment was designed for these two cases, employing the data from cases 1 and 9 as support and query sets. Furthermore, a 9 − way K − shot experiment was designed for all cases, utilizing the data from all cases as support and query sets. This experimental design comprehensively assesses the few − shot learning capability under different scenarios. It facilitates a comprehensive analysis of the performance of the DCCMN model in various classification tasks. Through these experimental designs, a more comprehensive understanding of the performance and applicability of the DCCMN model in few − shot learning can be achieved.

5.2.1 2 − way K − shot Experiment

During the 2 − way K − shot experiment's training phase, the inner and outer loop iterations (episode) were set to 10. The cross − entropy loss function was employed as the criterion. The training process of the experiment is depicted in Fig. 7. As shown in the graph, when the episode reaches 3, the accuracy of the support and query sets of the training task exceeds 99%. When the episode reaches 4, the loss for the support and query sets are 9×10^− 3 and 6×10^− 3, respectively. After each episode, the average meta loss stabilizes below 0.1. Overall, the DCCMN model demonstrates rapid fitting speed and high accuracy in the training task. It indicates that the model can effectively learn and generalize when dealing with binary classification problems, achieving favorable training outcomes.

In the testing phase, the results of Sensor 6 in the testing task were used as an example. Firstly, fine − tuning training was performed on the saved model using the support set of Sensor 6. The experiments were conducted for 2 − way 1 − shot, 2 − way 3 − shot, and 2 − way 5 − shot. Subsequently, the query set was used for testing. When the number of samples in the query set is insufficient, accuracy might not accurately reflect the model's performance and generalization ability. The model might exhibit good performance on the testing set by chance or coincidentally predict some samples correctly without possessing accurate high accuracy. Reliance solely on accuracy from a few samples can introduce uncertainty and bias. Therefore, for the query set of the testing task, 20, 40, and 80 samples were respectively used for testing, and five times were carried out in each case, and the results are shown in Fig. 8. In Fig. 8, K1 represents the number of samples per category in the support set of the testing task (K1 = shot), K2 represents the number of samples in the query set of the testing task, and the red values represent the mean and standard deviation.

The average accuracy and loss of the query set from Fig. 8 are plotted in Fig. 9. The combined analysis of Figs. 8 and 9 shows that the average accuracy of the support set is stable at 100% across multiple detections, with the lowest loss reaching 1×10^− 7. For K1 = 1 and K2 = 20, the average accuracy of the query set is 98%. For K1 = 1 and K2 = 80, the average accuracy of the query set is 95%. For K1 = 5, the average accuracy of the query set reaches 100%. Overall, with an increase in the number of K1 samples, the accuracy of the query set demonstrates an upward trend, while the loss exhibits a downward trend.

5.2.2 9 − way K − shot Experiment

During the training phase of the 9 − way K − shot experiment, the inner loop iterations were set to 15, and the outer loop iterations (episode) were set to 10. The cross − entropy loss function was employed as the criterion. The training process of the experiment is illustrated in Fig. 10. As shown in the graph, when the episode reaches 2, the accuracy of the query set in the training task is around 90%. When the episode reaches 4, the loss for the support and query sets are 0.01 and 0.03, respectively. After each episode, the calculated average meta loss stabilizes below 0.2. The DCCMN model exhibits good training outcomes when dealing with the 9 − classification problem.

In the testing phase, the results for Sensor 6 in the testing task were considered again. Firstly, fine − tuning training was performed on the saved model using the support set of Sensor 6. The experiments were conducted for 9 − way 1 − shot, 9 − way 3 − shot, 9 − way 5 − shot, and 9 − way 10 − shot. Subsequently, the query set was used for testing. The query set utilized 20, 40, and 80 samples for detection, respectively and tested five times in each scenario. The results are shown in Fig. 11. In Fig. 11, K1 represents the number of samples per category in the support set (K1 = shot), K2 represents the number of samples in the query set of the testing task, and the red values represent the mean and standard deviation.

By plotting the average accuracy and loss of the query set from Fig. 11, as depicted in Fig. 12, it can be observed that the average accuracy of the support set remains relatively stable at or near 100% across multiple detections, with a minimum of 97.8%. When K2 = 40, with an increase in the value of K1, the average accuracy of the query set reaches 94.5%, 95.5%, 96.3%, and 97.75%, respectively. When K2 = 80, with an increase in the value of K1, the average accuracy of the query set reaches 90%, 93.25%, 97%, and 99%, respectively. Overall, a trend emerges where an increase in the number of K1 samples corresponds to a rise in the accuracy of the query set and a decrease in the loss.

5.3 Model Comparison Experiment

By analyzing the results of the 2 − way K − shot and 9 − way K − shot experiments, it can be inferred that combining Meta − Learning with the DCCMN model (Meta + DCCMN) yields high and stable accuracy, enabling damage detection in a few samples. To further validate the superiority of this approach, this section conducts comparative experiments with various other methods, including SVM, ResNet − 18, DCCMN, Meta + ResNet − 18, and Meta + DCCMN methods. Through these comparative experiments, the performance of different methods can be comprehensively evaluated, and the potential improved performance and accuracy of the Meta + DCCMN method in few − shot learning can be verified. Such comparative experiments will provide an essential reference for a deeper understanding and application of this method.

The same experimental procedure of the 9 − way K − shot experiment was applied to all these methods under identical conditions. Since the ResNet − 18 model is more suitable for larger datasets, two scenarios were considered for the experiment: K1 = 10, K2 = 80 and K1 = 50, K2 = 80. Each model underwent 20 epochs in the training phase and was tested five times. The accuracy of these models on the test set or the query set of the testing task was recorded, and the results are presented in Tables 2 and 3.

Table 2

Model accuracy of K1 = 10 and K2 = 80
	SVM	ResNet − 18	DCCMN	Meta + ResNet − 18	Meta + DCCMN
E1	0.475	0.112	0.162	0.100	0.975
E2	0.450	0.100	0.138	0.150	0.988
E3	0.438	0.100	0.338	0.100	1
E4	0.475	0.150	0.162	0.160	0.988
E5	0.463	0.100	0.225	0.150	1
Average	0.460	0.112	0.205	0.132	0.990

Table 3

Model accuracy of K1 = 50 and K2 = 80
	SVM	ResNet − 18	DCCMN	Meta + ResNet − 18	Meta + DCCMN
E1	0.713	0.188	0.938	0.162	1
E2	0.688	0.138	1.000	0.213	1
E3	0.725	0.150	0.988	0.162	1
E4	0.738	0.125	0.950	0.125	1
E5	0.688	0.188	0.950	0.225	1
Average	0.710	0.158	0.965	0.177	1

Figure 13 compares average accuracy among different models by plotting the average from Tables 2 and 3. This comparison reveals that when K1 = 10, SVM achieves an accuracy of 46%, while the ResNet − 18 and DCCMN models exhibit underfitting, with 11.2% and 20.5%, respectively. In contrast, Meta + ResNet − 18 only shows a slight improvement in accuracy, increasing by 2%, while Meta + DCCMN achieves an accuracy of 99%. As K1 = 50, with a larger training sample size, the accuracy of all models improves. SVM achieves an accuracy of 71%, ResNet − 18's accuracy increases to 15.8%, and the DCCMN model demonstrates a significant improvement with an accuracy of 96.5%. Meta + ResNet − 18 achieves an accuracy of 17.7%, while Meta + DCCMN reaches 100% accuracy.

Under the experimental conditions of K1 = 50 and K2 = 80, we have selected one instance of each model's experimental results and generated confusion matrices and TSNE visualization plots to analyze the models' performance further. These results are presented in Fig. 14 and Fig. 15.

The confusion matrix provides a detailed assessment of the model's classification performance. It displays the relationship between true labels and predicted labels, helping us understand the accuracy and errors of the model across different categories. The confusion matrix allows us to visually observe the performance differences of the model across various categories.

TSNE visualization assists in depicting the clustering of data points in a two − dimensional space. By reducing high − dimensional features, it visually represents different class data points on a plane. Observing the TSNE plot helps intuitively understand the distribution of different class samples and the similarities between samples. This aids in determining whether the model can effectively differentiate samples for a specific task.

Through the analysis of the confusion matrix and TSNE visualization plots, we can further evaluate the performance of each model under the conditions of K1 = 50 and K2 = 80. We can compare their differences and strengths in classification tasks.

From the confusion matrix, it can be observed that SVM, DCCMN, and Meta + DCCMN methods perform well in classification. However, due to the complex structure of the ResNet − 18 model, its classification performance on small sample detection of vibration signals is not satisfactory. The TSNE visualization results indicate that the original data is quite disorganized, with the SVM model performing better in classifying minority classes (e.g., label 1 and label 5). On the other hand, the ResNet − 18 model's clustering performance, both before and after incorporating Meta − Learning, is poor. In contrast, the DCCMN model performs excellent clustering before and after integrating Meta − Learning, effectively grouping the data.

This paper proposed a damage detection method for frame structures in the case of limited samples. Drawing inspiration from dynamic convolution, ResNet, and ShuffleNet models, the concept of the Cross − Mix module was introduced upon which the DCCMN model was built. Using the frame structure of Columbia University as an experimental subject, damage detection experiments were conducted using the combination of Meta − Learning and the DCCMN model (Meta + DCCMN), and results were compared with various other models. The following conclusions can be drawn from the experimental studies:

(1) In the context of damage detection experiments, the Meta + DCCMN method demonstrated rapid convergence in 2 − way K − shot and 9 − way K − shot training tasks, exhibiting high accuracy in testing tasks. For instance, the accuracy of the query set for the 9 − way 1 − shot can reach 90%, and for the 9 − way 10 − shot can reach 99%.

(2) In the model comparison experiments, when the number of training samples K1 = 50, the accuracy of using only the DCCMN model was 96.5%, while the SVM model achieved an accuracy of only 71%. The accuracy of Meta + DCCMN, however, reached 100%.

In summary, the damage detection method based on Meta − Learning and the DCCMN model proposed in this paper exhibited high accuracy and rapid convergence in few − shot scenarios, indicating promising potential for practical applications.

Ethical approval: Not applicable

Competing interests: The authors declare no conflict of interests.

Author Contributions: All authors contributed equally to this work.

Funding: This work was supported by Hebei Natural Science Foundation under Grant no E2023402071, and Key Laboratory of Intelligent Industrial Equipment Technology of Hebei Province (Hebei University of Engineering) under Grant 202204 and 202206.

Availability of data and materials: No additional data or materials available.

Sohn H, Farrar CR, Hemez FM (2003) A review of structural health monitoring literature: 1996–2001. Los Alamos National Laboratory, USA, 1 – 16
Salawu OS (1997) Detection of structural damage through changes in frequency: a review. Eng Struct 19(9):718–723
An Y, Chatzi E, Sim SH (2019) Recent progress and future trends on damage identification methods for bridge structures. Struct Control Health Monit, 26(10), e2416
Mashayekhi M, Santini – Bell E (2019) Three – dimensional multiscale finite element models for in – service performance assessment of bridges. Comput – Aided Civ Infrastruct Eng 34:385–401
Mashayekhizadeh M (2018) Fatigue Assessment of Complex Structural Components of Steel Bridges Integrating Finite Element Models and Field – Collected Data. Bridge Struct 15:75–86
Chesné S, Deraemaeker A (2013) Damage localization using transmissibility functions: A critical review. Mech Syst Signal Process 38(2):569–584
Amezquita – Sanchez JP, Adeli H (2016) Structural damage localization from modal strain energy change. Arch Comput Methods Eng 23(1):1–15
Meruane V, Heylen W (2011) An hybrid real genetic algorithm to detect structural damage using modal properties. Mech Syst Signal Process 25(5):1559–1573
Wu RT, Jahanshahi MR (2020) Data fusion approaches for structural health monitoring and system identification: past, present, and future[J]. Struct Health Monit 19(2):552–586
Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237
Kaveh A, Dadras A (2018) Structural damage identification using an enhanced thermal exchange optimization algorithm. Eng Optim 50:430–451
Kan MS, Tan AC, Mathew J (2015) A review on prognostic techniques for non – stationary and non – linear rotating systems. Mech Syst Signal Process 62:1–20
Zhang W, Peng G, Li C (2016) Rolling element bearing fault intelligent diagnosis based on convolutional neural networks using raw sensing signal. Advances in Intelligent Information Hiding and Multimedia Signal Processing: Proceeding of the Twelfth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Nov., 21 – 23, Kaohsiung, Taiwan, Volume 2. Springer International Publishing, 2017, 77 – 84
Yang B, Zio E, Liu R (2018) Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech Syst Signal Process 108:33–47
Goi Y, Kim CW (2017) Damage detection of a truss bridge utilizing a damage indicator from multivariate autoregressive model. J Civil Struct Health Monit 7:153–162
Cheung A, Cabrera C, Sarabandi P (2008) The application of statistical pattern recognition methods for damage detection to field data. Smart Mater Struct 17(6):065023
Li H, Deng X, Dai H (2007) Structural damage detection using the combination method of EMD and wavelet analysis. Mech Syst Signal Process 21(1):298–306
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer 29(3):31–44
Krenker A, Bešter J, Kos A (2011) Introduction to the artificial neural networks. Artif Neural Networks: Methodological Adv Biomedical Appl InTech, 1–18
Cortes C, Vapnik V (1995) Support – vector networks. Mach Learn 20:273–297
Simonyan K, Zisserman A Very Deep Convolutional Networks for Large – Scale Image Recognition. arXiv 2014, arXiv:1409.1556.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2818 – 2826
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Adv Neural Inform Process Syst (NIPS) 1097–1105
Li X, Wang W, Hu X (2019) Selective kernel networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 510 – 519
Betti M, Facchini L, Biagini P (2015) Damage detection on a three – storey steel frame using artificial neural networks and genetic algorithms. Meccanica 50:875–886
Padil KH, Bakhary N, Abdulkareem M (2020) Non – probabilistic method to consider uncertainties in frequency response function for vibration – based damage detection using Artificial Neural Network. J Sound Vib 467:115069
Hakim SJS, Razak HA, Ravanfar SA (2015) Fault diagnosis on beam – like structures from modal parameters using artificial neural networks. Measurement 76:45–61
Pathirage CSN, Li J, Li L, Hao H, Liu W, Wang R (2019) Development and application of a deep learning–based sparse autoencoder framework for structural damage identification. Struct Health Monit 18:103–122
Hochreiter S, Younger AS, Conwell PR (2001) Learning to learn using gradient descent. Artificial Neural Networks—ICANN : International Conference Vienna, Austria, August 21–25, 2001 Proceedings 11. Springer Berlin Heidelberg, 2001, 87 – 94
Yang B, Bender G, Le QV, Condconv (2019) : Conditionally parameterized convolutions for efficient inference. Adv Neural Inf Process Syst, 32
Chen Y, Dai X, Liu M (2020) Dynamic convolution: Attention over convolution kernels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11030 – 11039
Zhang Y, Zhang J, Wang Q, Dynet (2020) : Dynamic convolution for accelerating convolutional neural networks. arXiv preprint arXiv:2004.10694,
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770 – 778
Zhang X, Zhou X, Lin M, Shufflenet (2018) : An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE conference on computer vision and pattern recognition. 6848 – 6856
Abdeljaber AY, O A, Kiranyaz N (2018) 1 – DCNNs for structural damage detection: Verification on a structural health monitoring benchmark data. Neurocomputing 275:1308–1317

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Few−shot learning of frame structure damage detection based on Meta−Learning and DCCMN model

Status:

Version 1

Abstract

Figures

1. Introduction

2. Meta − Learning Method

3. Neural network model

3.1 Dynamic Convolution

3.2 Cross − Mix Module

3.3 DCCMN Model

4. Damage detection process

5. Few−Shot Experimental Study

5.1 Experimental Subject and Data

5.2 Damage Detection Experiment

5.2.1 2 − way K − shot Experiment

5.2.2 9 − way K − shot Experiment

5.3 Model Comparison Experiment

6. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1